Publications

Refine Results

(Filters Applied) Clear All

R&D Areas

R&D Groups

Year

Items per page

By

Lauren Milechin Clear filter

Sampling operations on big data

November 8, 2015

Conference Paper

Author:

Vijay N. Gadepally

…

Published in:

2015 Asilomar Conf. on Signals, Systems and Computers, 8-11 November 2015.

Topic:

big data

R&D area:

Cyber Security and Information Sciences

R&D group:

Secure Resilient Systems and Technology

Summary

The 3Vs -- Volume, Velocity and Variety -- of Big Data continues to be a large challenge for systems and algorithms designed to store, process and disseminate information for discovery and exploration under real-time constraints. Common signal processing operations such as sampling and filtering, which have been used for decades to compress signals are often undefined in data that is characterized by heterogeneity, high dimensionality, and lack of known structure. In this article, we describe and demonstrate an approach to sample large datasets such as social media data. We evaluate the effect of sampling on a common predictive analytic: link prediction. Our results indicate that greatly sampling a dataset can still yield meaningful link prediction results.

READ LESS

Summary

Sampling operations on big data

Sampling large graphs for anticipatory analytics

September 15, 2015

Conference Paper

Author:

Lauren Milechin

…

Published in:

HPEC 2015: IEEE Conf. on High Performance Extreme Computing, 15-17 September 2015.

Topic:

big data

R&D area:

Cyber Security and Information Sciences

R&D group:

Secure Resilient Systems and Technology

Summary

The characteristics of Big Data - often dubbed the 3V's for volume, velocity, and variety - will continue to outpace the ability of computational systems to process, store, and transmit meaningful results. Traditional techniques for dealing with large datasets often include the purchase of larger systems, greater human-in-the-loop involvement, or more complex algorithms. We are investigating the use of sampling to mitigate these challenges, specifically sampling large graphs. Often, large datasets can be represented as graphs where data entries may be edges, and vertices may be attributes of the data. In particular, we present the results of sampling for the task of link prediction. Link prediction is a process to estimate the probability of a new edge forming between two vertices of a graph, and it has numerous application areas in understanding social or biological networks. In this paper we propose a series of techniques for the sampling of large datasets. In order to quantify the effect of these techniques, we present the quality of link prediction tasks on sampled graphs, and the time saved in calculating link prediction statistics on these sampled graphs.

READ LESS

Summary

Sampling large graphs for anticipatory analytics

Enabling on-demand database computing with MIT SuperCloud database management system

September 15, 2015

Conference Paper

Author:

Andrew J. Prout

…

Published in:

HPEC 2015: IEEE Conf. on High Performance Extreme Computing, 15-17 September 2015.

Topic:

high performance computing

R&D area:

Cyber Security and Information Sciences

R&D group:

Secure Resilient Systems and Technology

Summary

The MIT SuperCloud database management system allows for rapid creation and flexible execution of a variety of the latest scientific databases, including Apache Accumulo and SciDB. It is designed to permit these databases to run on a High Performance Computing Cluster (HPCC) platform as seamlessly as any other HPCC job. It ensures the seamless migration of the databases to the resources assigned by the HPCC scheduler and centralized storage of the database files when not running. It also permits snapshotting of databases to allow researchers to experiment and push the limits of the technology without concerns for data or productivity loss if the database becomes unstable.

READ LESS

Summary

Enabling on-demand database computing with MIT SuperCloud database management system

Global pattern search at scale

April 14, 2015

Conference Paper

Author:

Ryan Jordan Crouser

…

Published in:

IEEE Int. Symp. on Technologies for Homeland Security, 14-16 April 2015.

Topic:

big data

R&D area:

Cyber Security and Information Sciences

R&D group:

Artificial Intelligence Technology and Systems

Summary

In recent years, data collection has far outpaced the tools for data analysis in the area of non-traditional GEOINT analysis. Traditional tools are designed to analyze small-scale numerical data, but there are few good interactive tools for processing large amounts of unstructured data such as raw text. In addition to the complexities of data processing, presenting the data in a way that is meaningful to the end user poses another challenge. In our work, we focused on analyzing a corpus of 35,000 news articles and creating an interactive geovisualization tool to reveal patterns to human analysts. Our comprehensive tool, Global Pattern Search at Scale (GPSS), addresses three major problems in data analysis: free text analysis, high volumes of data, and interactive visualization. GPSS uses an Accumulo database for high-volume data storage, and a matrix of word counts and event detection algorithms to process the free text. For visualization, the tool displays an interactive web application to the user, featuring a map overlaid with document clusters and events, search and filtering options, a timeline, and a word cloud. In addition, the GPSS tool can be easily adapted to process and understand other large free-text datasets.

READ LESS

Summary

Global pattern search at scale

Publications

Refine Results

By

Sampling operations on big data

Summary

Summary

Sampling large graphs for anticipatory analytics

Summary

Summary

Enabling on-demand database computing with MIT SuperCloud database management system

Summary

Summary

Global pattern search at scale

Summary

Summary

Showing Results