Publications
AI enabling technologies: a survey
Summary
Summary
Artificial Intelligence (AI) has the opportunity to revolutionize the way the United States Department of Defense (DoD) and Intelligence Community (IC) address the challenges of evolving threats, data deluge, and rapid courses of action. Developing an end-to-end artificial intelligence system involves parallel development of different pieces that must work together...
A billion updates per second using 30,000 hierarchical in-memory D4M databases
Summary
Summary
Analyzing large scale networks requires high performance streaming updates of graph representations of these data. Associative arrays are mathematical objects combining properties of spreadsheets, databases, matrices, and graphs, and are well-suited for representing and analyzing streaming network data. The Dynamic Distributed Dimensional Data Model (D4M) library implements associative arrays in...
Hyperscaling internet graph analysis with D4M on the MIT SuperCloud
Summary
Summary
Detecting anomalous behavior in network traffic is a major challenge due to the volume and velocity of network traffic. For example, a 10 Gigabit Ethernet connection can generate over 50 MB/s of packet headers. For global network providers, this challenge can be amplified by many orders of magnitude. Development of...
Large-scale Bayesian kinship analysis
Summary
Summary
Kinship prediction in forensics is limited to first degree relatives due to the small number of short tandem repeat loci characterized. The Genetic Chain Rule for Probabilistic Kinship Estimation can leverage large panels of single nucleotide polymorphisms (SNPs) or sets of sequence linked SNPs, called haploblocks, to estimate more distant...
Interactive supercomputing on 40,000 cores for machine learning and data analysis
Summary
Summary
Interactive massively parallel computations are critical for machine learning and data analysis. These computations are a staple of the MIT Lincoln Laboratory Supercomputing Center (LLSC) and has required the LLSC to develop unique interactive supercomputing capabilities. Scaling interactive machine learning frameworks, such as TensorFlow, and data analysis environments, such as...
GraphChallenge.org: raising the bar on graph analytic performance
Summary
Summary
The rise of graph analytic systems has created a need for new ways to measure and compare the capabilities of graph processing systems. The MIT/Amazon/IEEE Graph Challenge has been developed to provide a well-defined community venue for stimulating research and highlighting innovations in graph analysis software, hardware, algorithms, and systems...
Measuring the impact of Spectre and Meltdown
Summary
Summary
The Spectre and Meltdown flaws in modern microprocessors represent a new class of attacks that have been difficult to mitigate. The mitigations that have been proposed have known performance impacts. The reported magnitude of these impacts varies depending on the industry sector and expected workload characteristics. In this paper, we...
Colorization of H&E stained tissue using deep learning
Summary
Summary
Histopathology is a critical tool in the diagnosis and stratification of cancer. Digital Pathology involves the scanning of stained and fixed tissue samples to produce high-resolution images that can be used for computer-aided diagnosis and research. A common challenge in digital pathology related to the quality and characteristics of staining...
Lessons learned from a decade of providing interactive, on-demand high performance computing to scientists and engineers
Summary
Summary
For decades, the use of HPC systems was limited to those in the physical sciences who had mastered their domain in conjunction with a deep understanding of HPC architectures and algorithms. During these same decades, consumer computing device advances produced tablets and smartphones that allow millions of children to interactively...
Detecting pathogen exposure during the non-symptomatic incubation period using physiological data
Summary
Summary
Early pathogen exposure detection allows better patient care and faster implementation of public health measures (patient isolation, contact tracing). Existing exposure detection most frequently relies on overt clinical symptoms, namely fever, during the infectious prodromal period. We have developed a robust machine learning based method to better detect asymptomatic states...