Publications
LLSuperCloud: sharing HPC systems for diverse rapid prototyping
Summary
Summary
The supercomputing and enterprise computing arenas come from very different lineages. However, the advent of commodity computing servers has brought the two arenas closer than they have ever been. Within enterprise computing, commodity computing servers have resulted in the development of a wide range of new cloud capabilities: elastic computing...
D4M 2.0 Schema: a general purpose high performance schema for the Accumulo database
Summary
Summary
Non-traditional, relaxed consistency, triple store databases are the backbone of many web companies (e.g., Google Big Table, Amazon Dynamo, and Facebook Cassandra). The Apache Accumulo database is a high performance open source relaxed consistency database that is widely used for government applications. Obtaining the full benefits of Accumulo requires using...
Very large graphs for information extraction (VLG) - summary of first-year proof-of-concept study
Summary
Summary
In numerous application domains relevant to the Department of Defense and the Intelligence Community, data of interest take the form of entities and the relationships between them, and these data are commonly represented as graphs. Under the Very Large Graphs for Information Extraction effort--a one-year proof-of-concept study--MIT LL developed novel...
A language-independent approach to automatic text difficulty assessment for second-language learners
Summary
Summary
In this paper we introduce a new baseline for language-independent text difficulty assessment applied to the Interagency Language Roundtable (ILR) proficiency scale. We demonstrate that reading level assessment is a discriminative problem that is best-suited for regression. Our baseline uses z-normalized shallow length features and TF-LOG weighted vectors on bag-of-words...
Efficient anomaly detection in dynamic, attributed graphs: emerging phenomena and big data
Summary
Summary
When working with large-scale network data, the interconnected entities often have additional descriptive information. This additional metadata may provide insight that can be exploited for detection of anomalous events. In this paper, we use a generalized linear model for random attributed graphs to model connection probabilities using vertex metadata. For...
Probabilistic threat propagation for malicious activity detection
Summary
Summary
In this paper, we present a method for detecting malicious activity within networks of interest. We leverage prior community detection work by propagating threat probabilities across graph nodes, given an initial set of known malicious nodes. We enhance prior work by employing constraints which remove the adverse effect of cyclic...
Link prediction methods for generating speaker content graphs
Summary
Summary
In a speaker content graph, vertices represent speech signals and edges represent speaker similarity. Link prediction methods calculate which potential edges are most likely to connect vertices from the same speaker; those edges are included in the generated speaker content graph. Since a variety of speaker recognition tasks can be...
Large-scale community detection on speaker content graphs
Summary
Summary
We consider the use of community detection algorithms to perform speaker clustering on content graphs built from large audio corpora. We survey the application of agglomerative hierarchical clustering, modularity optimization methods, and spectral clustering as well as two random walk algorithms: Markov clustering and Infomap. Our results on graphs built...
Sparse volterra systems: theory and practice
Summary
Summary
Nonlinear effects limit analog circuit performance, causing both in-band and out-of-band distortion. The classical Volterra series provides an accurate model of many nonlinear systems, but the number of parameters grows extremely quickly as the memory depth and polynomial order are increased. Recently, concepts from compressed sensing have been applied to...
An Expectation Maximization Approach to Detecting Compromised Remote Access Accounts(267.16 KB)
Summary
Summary
Just as credit-card companies are able to detect aberrant transactions on a customer’s credit card, it would be useful to have methods that could automatically detect when a user’s login credentials for Virtual Private Network (VPN) access have been compromised. We present here a novel method for detecting that a...