Publications
Tagged As
Temporal and multi-source fusion for detection of innovation in collaboration networks
Summary
Summary
A common problem in network analysis is detecting small subgraphs of interest within a large background graph. This includes multi-source fusion scenarios where data from several modalities must be integrated to form the network. This paper presents an application of novel techniques leveraging the signal processing for graphs algorithmic framework...
Global pattern search at scale
Summary
Summary
In recent years, data collection has far outpaced the tools for data analysis in the area of non-traditional GEOINT analysis. Traditional tools are designed to analyze small-scale numerical data, but there are few good interactive tools for processing large amounts of unstructured data such as raw text. In addition to...
Computing on Masked Data to improve the security of big data
Summary
Summary
Organizations that make use of large quantities of information require the ability to store and process data from central locations so that the product can be shared or distributed across a heterogeneous group of users. However, recent events underscore the need for improving the security of data stored in such...
Rapid sequence identification of potential pathogens using techniques from sparse linear algebra
Summary
Summary
The decreasing costs and increasing speed and accuracy of DNA sample collection, preparation, and sequencing has rapidly produced an enormous volume of genetic data. However, fast and accurate analysis of the samples remains a bottleneck. Here we present D4RAGenS, a genetic sequence identification algorithm that exhibits the Big Data handling...
Using a big data database to identify pathogens in protein data space [e-print]
Summary
Summary
Current metagenomic analysis algorithms require significant computing resources, can report excessive false positives (type I errors), may miss organisms (type II errors/false negatives), or scale poorly on large datasets. This paper explores using big data database technologies to characterize very large metagenomic DNA sequences in protein space, with the ultimate...
NEU_MITLL @ TRECVid 2015: multimedia event detection by pre-trained CNN models
Summary
Summary
We introduce a framework for multimedia event detection (MED), which was developed for TRECVID 2015 using convolutional neural networks (CNNs) to detect complex events via deterministic models trained on video frame data. We used several well-known CNN models designed to detect objects, scenes, and a combination of both (i.e., Hybrid-CNN)...
Spectral anomaly detection in very large graphs: Models, noise, and computational complexity(92.92 KB)
Summary
Summary
Anomaly detection in massive networks has numerous theoretical and computational challenges, especially as the behavior to be detected becomes small in comparison to the larger network. This presentation focuses on recent results in three key technical areas, specifically geared toward spectral methods for detection.
Sparse matrix partitioning for parallel eigenanalysis of large static and dynamic graphs
Summary
Summary
Numerous applications focus on the analysis of entities and the connections between them, and such data are naturally represented as graphs. In particular, the detection of a small subset of vertices with anomalous coordinated connectivity is of broad interest, for problems such as detecting strange traffic in a computer network...
Big Data dimensional analysis
Summary
Summary
The ability to collect and analyze large amounts of data is a growing problem within the scientific community. The growing gap between data and users calls for innovative tools that address the challenges faced by big data volume, velocity and variety. One of the main challenges associated with big data...
Achieving 100,000,000 database inserts per second using Accumulo and D4M
Summary
Summary
The Apache Accumulo database is an open source relaxed consistency database that is widely used for government applications. Accumulo is designed to deliver high performance on unstructured data such as graphs of network data. This paper tests the performance of Accumulo using data from the Graph500 benchmark. The Dynamic Distributed...