Publications
Global pattern search at scale
Summary
Summary
In recent years, data collection has far outpaced the tools for data analysis in the area of non-traditional GEOINT analysis. Traditional tools are designed to analyze small-scale numerical data, but there are few good interactive tools for processing large amounts of unstructured data such as raw text. In addition to...
Spectral anomaly detection in very large graphs: Models, noise, and computational complexity(92.92 KB)
Summary
Summary
Anomaly detection in massive networks has numerous theoretical and computational challenges, especially as the behavior to be detected becomes small in comparison to the larger network. This presentation focuses on recent results in three key technical areas, specifically geared toward spectral methods for detection.
Sparse matrix partitioning for parallel eigenanalysis of large static and dynamic graphs
Summary
Summary
Numerous applications focus on the analysis of entities and the connections between them, and such data are naturally represented as graphs. In particular, the detection of a small subset of vertices with anomalous coordinated connectivity is of broad interest, for problems such as detecting strange traffic in a computer network...
Spectral subgraph detection with corrupt observations
Summary
Summary
Recent work on signal detection in graph-based data focuses on classical detection when the signal and noise are both in the form of discrete entities and their relationships. In practice, the relationships of interest may not be directly observable, or may be observed through a noisy mechanism. The effects of...
Effective parallel computation of eigenpairs to detect anomalies in very large graphs
Summary
Summary
The computational driver for an important class of graph analysis algorithms is the computation of leading eigenvectors of matrix representations of the graph. In this presentation, we discuss the challenges of calculating eigenvectors of modularity matrices derived from very large graphs (upwards of a billion vertices) and demonstrate the scaling...
Very large graphs for information extraction (VLG) - summary of first-year proof-of-concept study
Summary
Summary
In numerous application domains relevant to the Department of Defense and the Intelligence Community, data of interest take the form of entities and the relationships between them, and these data are commonly represented as graphs. Under the Very Large Graphs for Information Extraction effort--a one-year proof-of-concept study--MIT LL developed novel...
Efficient anomaly detection in dynamic, attributed graphs: emerging phenomena and big data
Summary
Summary
When working with large-scale network data, the interconnected entities often have additional descriptive information. This additional metadata may provide insight that can be exploited for detection of anomalous events. In this paper, we use a generalized linear model for random attributed graphs to model connection probabilities using vertex metadata. For...
Sparse volterra systems: theory and practice
Summary
Summary
Nonlinear effects limit analog circuit performance, causing both in-band and out-of-band distortion. The classical Volterra series provides an accurate model of many nonlinear systems, but the number of parameters grows extremely quickly as the memory depth and polynomial order are increased. Recently, concepts from compressed sensing have been applied to...
Detection theory for graphs
Summary
Summary
Graphs are fast emerging as a common data structure used in many scientific and engineering fields. While a wide variety of techniques exist to analyze graph datasets, practitioners currently lack a signal processing theory akin to that of detection and estimation in the classical setting of vector spaces with Gaussian...
Characterization of traffic and structure in the U.S. airport network
Summary
Summary
In this paper we seek to characterize traffic in the U.S. air transportation system, and to subsequently develop improved models of traffic demand. We model the air traffic within the U.S. national airspace system as dynamic weighted network. We employ techniques advanced by work in complex networks over the past...