Publications

Refine Results

(Filters Applied) Clear All

P-sync: a photonically enabled architecture for efficient non-local data access

Summary

Communication in multi- and many-core processors has long been a bottleneck to performance due to the high cost of long-distance electrical transmission. This difficulty has been partially remedied by architectural constructs such as caches and novel interconnect topologies, albeit at a steep cost in terms of complexity. Unfortunately, even these measures are rendered ineffective by certain kinds of communication, most notably scatter and gather operations that exhibit highly non-local data access patterns. Much work has gone into examining how the increased bandwidth density afforded by chip-scale silicon photonic interconnect technologies affects computing, but photonics have additional properties that can be leveraged to greatly accelerate performance and energy efficiency under such difficult loads. This paper describes a novel synchronized global photonic bus and system architecture called P-sync that uses photonics' distance independence to greatly improve performance on many important applications previously limited by electronic interconnect. The architecture is evaluated in the context of a non-local yet common application: the distributed Fast Fourier Transform. We show that it is possible to achieve high efficiency by tightly balancing computation and communication latency in P-sync and achieve upwards of a 6x performance increase on gather patterns, even when bandwidth is equalized.
READ LESS

Summary

Communication in multi- and many-core processors has long been a bottleneck to performance due to the high cost of long-distance electrical transmission. This difficulty has been partially remedied by architectural constructs such as caches and novel interconnect topologies, albeit at a steep cost in terms of complexity. Unfortunately, even these...

READ MORE

LLGrid: supercomputer for sensor processing

Summary

MIT Lincoln Laboratory is a federally funded research and development center that applies advanced technology to problems of national interest. Research and development activities focus on long-term technology development as well as rapid system prototyping and demonstration. A key part of this mission is to develop and deploy advanced sensor systems. Developing the algorithms for these systems requires interactive access to large scale computing and data storage. Deploying these systems requires that the computing and storage capabilities are transportable and energy efficient. The LLGrid system of supercomputers allows hundreds of researchers simultaneous interactive access to large amounts of processing and storage for development and testing of their sensor processing algorithms. The requirements of the LLGrid user base are as diverse as the sensors they are developing: sonar, radar, infrared, optical, hyperspectral, video, bio and cyber. However, there are two common elements: delivering large amounts of data interactively to many processors and high level user interfaces that require minimal user training. The LLGrid software stack provides these capabilities on dozens of LLGrid computing clusters across Lincoln Laboratory. LLGrid systems range from very small (a few nodes) to very large (40+ racks).
READ LESS

Summary

MIT Lincoln Laboratory is a federally funded research and development center that applies advanced technology to problems of national interest. Research and development activities focus on long-term technology development as well as rapid system prototyping and demonstration. A key part of this mission is to develop and deploy advanced sensor...

READ MORE

Architecture-independent dynamic information flow tracking

Author:
Published in:
CC 2013: 22nd Int. Conf. on Compiler Construction, 16-24 March 2013, pp. 144-163.

Summary

Dynamic information flow tracking is a well-known dynamic software analysis technique with a wide variety of applications that range from making systems more secure, to helping developers and analysts better understand the code that systems are executing. Traditionally, the fine-grained analysis capabilities that are desired for the class of these systems which operate at the binary level require tight coupling to a specific ISA. This places a heavy burden on developers of these systems since significant domain knowledge is required to support each ISA, and the ability to amortize the effort expended on one ISA implementation cannot be leveraged to support other ISAs. Further, the correctness of the system must carefully evaluated for each new ISA. In this paper, we present a general approach to information flow tracking that allows us to support multiple ISAs without mastering the intricate details of each ISA we support, and without extensive verification. Our approach leverages binary translation to an intermediate representation where we have developed detailed, architecture-neutral information flow models. To support advanced instructions that are typically implemented in C code in binary translators, we also present a combined static/dynamic analysis that allows us to accurately and automatically support these instructions. We demonstrate the utility of our system in three different application settings: enforcing information flow policies, classifying algorithms by information flow properties, and characterizing types of programs which may exhibit excessive information flow in an information flow tracking system.
READ LESS

Summary

Dynamic information flow tracking is a well-known dynamic software analysis technique with a wide variety of applications that range from making systems more secure, to helping developers and analysts better understand the code that systems are executing. Traditionally, the fine-grained analysis capabilities that are desired for the class of these...

READ MORE

RECOG: Recognition and Exploration of Content Graphs

Published in:
Pacific Vision, 26 February - March 1, 2013.

Summary

We present RECOG (Recognition and Exploration of COntent Graphs), a system for visualizing and interacting with speaker content graphs constructed from large data sets of speech recordings. In a speaker content graph, nodes represent speech signals and edges represent speaker similarity. First, we describe a layout algorithm that optimizes content graphs for ease of navigability. We then present an interactive tool set that allows an end user to find and explore interesting occurrences in the corpus. We also present a tool set that allows a researcher to visualize the shortcomings of current content graph generation algorithms. RECOG's layout and toolsets were implemented as Gephi plugins [1].
READ LESS

Summary

We present RECOG (Recognition and Exploration of COntent Graphs), a system for visualizing and interacting with speaker content graphs constructed from large data sets of speech recordings. In a speaker content graph, nodes represent speech signals and edges represent speaker similarity. First, we describe a layout algorithm that optimizes content...

READ MORE

Novel graph processor architecture

Published in:
Lincoln Laboratory Journal, Vol. 20, No. 1, 2013, pp. 92-104.

Summary

Graph algorithms are increasingly used in applications that exploit large databases. However, conventional processor architectures are hard-pressed to handle the throughput and memory requirements of graph computation. Lincoln Laboratory's graph-processor architecture represents a fundamental rethinking of architectures. It utilizes innovations that include high-bandwidth three-dimensional (3D) communication links, a sparse matrix-based graph instruction set, accelerator-based architecture, a systolic sorter, randomized communications, a cacheless memory system, and 3D packaging.
READ LESS

Summary

Graph algorithms are increasingly used in applications that exploit large databases. However, conventional processor architectures are hard-pressed to handle the throughput and memory requirements of graph computation. Lincoln Laboratory's graph-processor architecture represents a fundamental rethinking of architectures. It utilizes innovations that include high-bandwidth three-dimensional (3D) communication links, a sparse matrix-based...

READ MORE

Taming biological big data with D4M

Published in:
Lincoln Laboratory Journal, Vol. 20, No. 1, 2013, pp. 82-91.

Summary

The supercomputing community has taken up the challenge of "taming the beast" spawned by the massive amount of data available in the bioinformatics domain: How can these data be exploited faster and better? MIT Lincoln Laboratory computer scientists demonstrated how a new Laboratory-developed technology, the Dynamic Distributed Dimensional Data Model (D4M), can be used to accelerate DNA sequence comparison, a core operation in bioinformatics.
READ LESS

Summary

The supercomputing community has taken up the challenge of "taming the beast" spawned by the massive amount of data available in the bioinformatics domain: How can these data be exploited faster and better? MIT Lincoln Laboratory computer scientists demonstrated how a new Laboratory-developed technology, the Dynamic Distributed Dimensional Data Model...

READ MORE

Detection theory for graphs

Summary

Graphs are fast emerging as a common data structure used in many scientific and engineering fields. While a wide variety of techniques exist to analyze graph datasets, practitioners currently lack a signal processing theory akin to that of detection and estimation in the classical setting of vector spaces with Gaussian noise. Using practical detection examples involving large, random "background" graphs and noisy real-world datasets, the authors present a novel graph analytics framework that allows for uncued analysis of very large datasets. This framework combines traditional computer science techniques with signal processing in the context of graph data, creating a new research area at the intersection of the two fields.
READ LESS

Summary

Graphs are fast emerging as a common data structure used in many scientific and engineering fields. While a wide variety of techniques exist to analyze graph datasets, practitioners currently lack a signal processing theory akin to that of detection and estimation in the classical setting of vector spaces with Gaussian...

READ MORE

Social network analysis with content and graphs

Published in:
Lincoln Laboratory Journal, Vol. 20, No. 1, 2013, pp. 62-81.

Summary

Social network analysis has undergone a renaissance with the ubiquity and quantity of content from social media, web pages, and sensors. This content is a rich data source for constructing and analyzing social networks, but its enormity and unstructured nature also present multiple challenges. Work at Lincoln Laboratory is addressing the problems in constructing networks from unstructured data, analyzing the community structure of a network, and inferring information from networks. Graph analytics have proven to be valuable tools in solving these challenges. Through the use of these tools, Laboratory researchers have achieved promising results on real-world data. A sampling of these results are presented in this article.
READ LESS

Summary

Social network analysis has undergone a renaissance with the ubiquity and quantity of content from social media, web pages, and sensors. This content is a rich data source for constructing and analyzing social networks, but its enormity and unstructured nature also present multiple challenges. Work at Lincoln Laboratory is addressing...

READ MORE

Graph embedding for speaker recognition

Published in:
Chapter in Graph Embedding for Pattern Analysis, 2013, pp. 229-60.

Summary

This chapter presents applications of graph embedding to the problem of text-independent speaker recognition. Speaker recognition is a general term encompassing multiple applications. At the core is the problem of speaker comparison-given two speech recordings (utterances), produce a score which measures speaker similarity. Using speaker comparison, other applications can be implemented-speaker clustering (grouping similar speakers in a corpus), speaker verification (verifying a claim of identity), speaker identification (identifying a speaker out of a list of potential candidates), and speaker retrieval (finding matches to a query set).
READ LESS

Summary

This chapter presents applications of graph embedding to the problem of text-independent speaker recognition. Speaker recognition is a general term encompassing multiple applications. At the core is the problem of speaker comparison-given two speech recordings (utterances), produce a score which measures speaker similarity. Using speaker comparison, other applications can be...

READ MORE

The MIT-LL/AFRL IWSLT-2011 MT System

Summary

This paper describes the MIT-LL/AFRL statistical MT system and the improvements that were developed during the IWSLT 2011 evaluation campaign. As part of these efforts, we experimented with a number of extensions to the standard phrase-based model that improve performance on the Arabic to English and English to French TED-talk translation tasks. We also applied our existing ASR system to the TED-talk lecture ASR task. We discuss the architecture of the MIT-LL/AFRL MT system, improvements over our 2010 system, and experiments we ran during the IWSLT-2011 evaluation. Specifically, we focus on 1) speech recognition for lecture-like data, 2) cross-domain translation using MAP adaptation, and 3) improved Arabic morphology for MT preprocessing.
READ LESS

Summary

This paper describes the MIT-LL/AFRL statistical MT system and the improvements that were developed during the IWSLT 2011 evaluation campaign. As part of these efforts, we experimented with a number of extensions to the standard phrase-based model that improve performance on the Arabic to English and English to French TED-talk...

READ MORE