Publications

Refine Results

(Filters Applied) Clear All

Dynamic Distributed Dimensional Data Model (D4M) database and computation system

Summary

A crucial element of large web companies is their ability to collect and analyze massive amounts of data. Tuple store databases are a key enabling technology employed by many of these companies (e.g., Google Big Table and Amazon Dynamo). Tuple stores are highly scalable and run on commodity clusters, but lack interfaces to support efficient development of mathematically based analytics. D4M (Dynamic Distributed Dimensional Data Model) has been developed to provide a mathematically rich interface to tuple stores (and structured query language "SQL" databases). D4M allows linear algebra to be readily applied to databases. Using D4M, it is possible to create composable analytics with significantly less effort than using traditional approaches. This work describes the D4M technology and its application and performance.
READ LESS

Summary

A crucial element of large web companies is their ability to collect and analyze massive amounts of data. Tuple store databases are a key enabling technology employed by many of these companies (e.g., Google Big Table and Amazon Dynamo). Tuple stores are highly scalable and run on commodity clusters, but...

READ MORE

Creating a cyber moving target for critical infrastructure applications using platform diversity

Published in:
Int. J. of Critical Infrastructure Protection, Vol. 5, No. 1, March 2012, pp. 30-39.

Summary

Despite the significant effort that often goes into securing critical infrastructure assets, many systems remain vulnerable to advanced, targeted cyber attacks. This paper describes the design and implementation of the Trusted Dynamic Logical Heterogeneity System (TALENT), a framework for live-migrating critical infrastructure applications across heterogeneous platforms. TALENT permits a running critical application to change its hardware platform and operating system, thus providing cyber survivability through platform diversity. TALENT uses containers (operating-system-level virtualization) and a portable checkpoint compiler to create a virtual execution environment and to migrate a running application across different platforms while preserving the state of the application (execution state, open files and network connections). TALENT is designed to support general applications written in the C programming language. By changing the platform on-the-fly, TALENT creates a cyber moving target and significantly raises the bar for a successful attack against a critical application. Experiments demonstrate that a complete migration can be completed within about one second.
READ LESS

Summary

Despite the significant effort that often goes into securing critical infrastructure assets, many systems remain vulnerable to advanced, targeted cyber attacks. This paper describes the design and implementation of the Trusted Dynamic Logical Heterogeneity System (TALENT), a framework for live-migrating critical infrastructure applications across heterogeneous platforms. TALENT permits a running...

READ MORE

Fundamental Questions in the Analysis of Large Graphs

Summary

Graphs are a general approach for representing information that spans the widest possible range of computing applications. They are particularly important to computational biology, web search, and knowledge discovery. As the sizes of graphs increase, the need to apply advanced mathematical and computational techniques to solve these problems is growing dramatically. Examining the mathematical and computational foundations of the analysis of large graphs generally leads to more questions than answers. This book concludes with a discussion of some of these questions.
READ LESS

Summary

Graphs are a general approach for representing information that spans the widest possible range of computing applications. They are particularly important to computational biology, web search, and knowledge discovery. As the sizes of graphs increase, the need to apply advanced mathematical and computational techniques to solve these problems is growing...

READ MORE

Visualizing Large Kronecker Graphs

Published in:
Graph Algorithms in the Language of Linear Algebra, pp. 241-250.

Summary

Kronecker graphs have been shown to be one of the most promising models for real-world networks. Visualization of Kronecker graphs is an important challenge. This chapter describes an interactive framework to assist scientists and engineers in generating, analyzing, and visualizing Kronecker graphs with as little effort as possible.
READ LESS

Summary

Kronecker graphs have been shown to be one of the most promising models for real-world networks. Visualization of Kronecker graphs is an important challenge. This chapter describes an interactive framework to assist scientists and engineers in generating, analyzing, and visualizing Kronecker graphs with as little effort as possible.

READ MORE

A knowledge-based operator for a genetic algorithm which optimizes the distribution of sparse matrix data

Published in:
Parallel Architectures and Bioinspired Algorithms

Summary

We present the Hogs and Slackers genetic algorithm (GA) which addresses the problem of improving the parallelization efficiency of sparse matrix computations by optimally distributing blocks of matrices data. The performance of a distribution is sensitive to the non-zero patterns in the data, the algorithm, and the hardware architecture. In a candidate distributions the Hogs and Slackers GA identifies processors with many operations – hogs, and processors with fewer operations – slackers. Its intelligent operation-balancing mutation operator then swaps data blocks between hogs and slackers to explore a new data distribution.We show that the Hogs and Slackers GA performs better than a baseline GA. We demonstrate Hogs and Slackers GA’s optimization capability with an architecture study of varied network and memory bandwidth and latency.
READ LESS

Summary

We present the Hogs and Slackers genetic algorithm (GA) which addresses the problem of improving the parallelization efficiency of sparse matrix computations by optimally distributing blocks of matrices data. The performance of a distribution is sensitive to the non-zero patterns in the data, the algorithm, and the hardware architecture. In...

READ MORE

Linear algebraic notation and definitions

Published in:
Graph Algorithms in the Language of Linear Algebra, pp. 13-18.

Summary

This chapter presents notation, definitions, and conventions for graphs, matrices, arrays, and operations upon them.
READ LESS

Summary

This chapter presents notation, definitions, and conventions for graphs, matrices, arrays, and operations upon them.

READ MORE

Subgraph Detection

Author:
Published in:
Graph Algorithms in the Language of Linear Algebra, pp. 115-133.

Summary

Detecting subgraphs of interest in larger graphs is the goal of many graph analysis techniques. The basis of detection theory is computing the probability of a “foreground” with respect to a model of the “background” data. Hidden Markov Models represent one possible foreground model for patterns of interaction in a graph. Likewise, Kronecker graphs are one possible model for power law background graphs. Combining these models allows estimates of the signal to noise ratio, probability of detection, and probability of false alarm for different classes of vertices in the foreground. These estimates can then be used to construct filters for computing the probability that a background graph contains a particular foreground graph. This approach is applied to the problem of detecting a partially labeled tree graph in a power law background graph. One feature of this method is the ability to a priori estimate the number of vertices that will be detected via the filter.
READ LESS

Summary

Detecting subgraphs of interest in larger graphs is the goal of many graph analysis techniques. The basis of detection theory is computing the probability of a “foreground” with respect to a model of the “background” data. Hidden Markov Models represent one possible foreground model for patterns of interaction in a...

READ MORE

The Kronecker theory of power law graphs

Author:
Published in:
Graph Algorithms in the Language of Linear Algebra, pp. 205-220.

Summary

An analytical theory of power law graphs is presented based on the Kronecker graph generation technique. Explicit, stochastic, and instance Kronecker graphs are used to highlight different properties. The analysis uses Kronecker exponentials of complete bipartite graphs to formulate the substructure of such graphs. The Kronecker theory allows various high-level quantities (e.g., degree distribution, betweenness centrality, diameter, eigenvalues, and iso-parametric ratio) to be computed directly from the model parameters.
READ LESS

Summary

An analytical theory of power law graphs is presented based on the Kronecker graph generation technique. Explicit, stochastic, and instance Kronecker graphs are used to highlight different properties. The analysis uses Kronecker exponentials of complete bipartite graphs to formulate the substructure of such graphs. The Kronecker theory allows various high-level...

READ MORE

Graphs and matrices

Author:
Published in:
Graph Algorithms in the Language of Linear Algebra, pp. 3-12

Summary

A linear algebraic approach to graph algorithms that exploits the sparse adjacency matrix representation of graphs can provide a variety of benefits. These benefits include syntactic simplicity, easier implementation, and higher performance. Selected examples are presented illustrating these benefits. These examples are drawn from the remainder of the book in the areas of algorithms, data analysis, and computation.
READ LESS

Summary

A linear algebraic approach to graph algorithms that exploits the sparse adjacency matrix representation of graphs can provide a variety of benefits. These benefits include syntactic simplicity, easier implementation, and higher performance. Selected examples are presented illustrating these benefits. These examples are drawn from the remainder of the book in...

READ MORE

Topic modeling for spoken documents using only phonetic information

Published in:
ASRU 2011, IEEE Workshop on Automatic Speech Recognition & Understanding, 11-15 December 2011, pp. 395-400.

Summary

This paper explores both supervised and unsupervised topic modeling for spoken audio documents using only phonetic information. In cases where word-based recognition is unavailable or infeasible, phonetic information can be used to indirectly learn and capture information provided by topically relevant lexical items. In some situations, a lack of transcribed data can prevent supervised training of a same-language phonetic recognition system. In these cases, phonetic recognition can use cross-language models or self-organizing units (SOUs) learned in a completely unsupervised fashion. This paper presents recent improvements in topic modeling using only phonetic information. We present new results using recently developed techniques for discriminative training for topic identification used in conjunction with recent improvements in SOU learning. A preliminary examination of the use of unsupervised latent topic modeling for unsupervised discovery of topics and topically relevant lexical items from phonetic information is also presented.
READ LESS

Summary

This paper explores both supervised and unsupervised topic modeling for spoken audio documents using only phonetic information. In cases where word-based recognition is unavailable or infeasible, phonetic information can be used to indirectly learn and capture information provided by topically relevant lexical items. In some situations, a lack of transcribed...

READ MORE