Publications

Refine Results

(Filters Applied) Clear All

R&D Areas

R&D Groups

Year

Items per page

Collaborative Data Analysis and Discovery for Cyber Security

June 22, 2016

Conference Paper

Author:

Diane P. Staheli

…

Published in:

Proceedings of the 12th Symposium on Usable Privacy and Security (SOUPS 2016)

Topic:

visualization

R&D area:

Cyber Security and Information Sciences

R&D group:

Cyber-Physical Systems

Summary

In this paper, we present the Cyber Analyst Real-Time Integrated Notebook Application (CARINA). CARINA is a collaborative investigation system that aids in decision making by co-locating the analysis environment with centralized cyber data sources, and providing next generation analysts with increased visibility to the work of others.

READ LESS

Summary

Collaborative Data Analysis and Discovery for Cyber Security

Channel compensation for speaker recognition using MAP adapted PLDA and denoising DNNs

June 21, 2016

Conference Paper

Author:

Frederick S. Richardson

…

Published in:

Odyssey 2016, The Speaker and Language Recognition Workshop, 21-24 June 2016.

Topic:

speaker recognition

R&D area:

Cyber Security and Information Sciences

R&D group:

Artificial Intelligence Technology and Systems

Summary

Over several decades, speaker recognition performance has steadily improved for applications using telephone speech. A big part of this improvement has been the availability of large quantities of speaker-labeled data from telephone recordings. For new data applications, such as audio from room microphones, we would like to effectively use existing telephone data to build systems with high accuracy while maintaining good performance on existing telephone tasks. In this paper we compare and combine approaches to compensate models parameters and features for this purpose. For model adaptation we explore MAP adaptation of hyper-parameters and for feature compensation we examine the use of denoising DNNs. On a multi-room, multi-microphone speaker recognition experiment we show a reduction of 61% in EER with a combination of these approaches while slightly improving performance on telephone data.

READ LESS

Summary

Channel compensation for speaker recognition using MAP adapted PLDA and denoising DNNs

The MITLL NIST LRE 2015 Language Recognition System

June 21, 2016

Conference Paper

Author:

Pedro A. Torres-Carrasquillo

…

Published in:

Odyssey 2016, 21-24 June 2016, pp. 196-203.

Topic:

language recognition

R&D area:

Cyber Security and Information Sciences

R&D group:

Artificial Intelligence Technology and Systems

Summary

In this paper we describe the most recent MIT Lincoln Laboratory language recognition system developed for the NIST 2015 Language Recognition Evaluation (LRE). The submission features a fusion of five core classifiers, with most systems developed in the context of an i-vector framework. The 2015 evaluation presented new paradigms. First, the evaluation included fixed training and open training tracks for the first time; second, language classification performance was measured across 6 language clusters using 20 language classes instead of an N-way language task; and third, performance was measured across a nominal 3-30 second range. Results are presented for the overall performance across the six language clusters for both the fixed and open training tasks. On the 6-cluster metric the Lincoln system achieved overall costs of 0.173 and 0.168 for the fixed and open tasks respectively.

READ LESS

Summary

The MITLL NIST LRE 2015 Language Recognition System

A vocal modulation model with application to predicting depression severity

June 14, 2016

Conference Paper

Author:

Rachelle Horwitz-Martin

…

Published in:

13th IEEE Int. Conf. on Wearable and Implantable Body Sensor Networks, BSN 2016, 14-17 June 2016.

Topic:

biometrics

R&D area:

Cyber Security and Information Sciences

R&D group:

Summary

Speech provides a potential simple and noninvasive "on-body" means to identify and monitor neurological diseases. Here we develop a model for a class of vocal biomarkers exploiting modulations in speech, focusing on Major Depressive Disorder (MDD) as an application area. Two model components contribute to the envelope of the speech waveform: amplitude modulation (AM) from respiratory muscles, and AM from interaction between vocal tract resonances (formants) and frequency modulation in vocal fold harmonics. Based on the model framework, we test three methods to extract envelopes capturing these modulations of the third formant for synthesized sustained vowels. Using subsequent modulation features derived from the model, we predict MDD severity scores with a Gaussian Mixture Model. Performing global optimization over classifier parameters and number of principal components, we evaluate performance of the features by examining the root-mean-squared error (RMSE), mean absolute error (MAE), and Spearman correlation between the actual and predicted MDD scores. We achieved RMSE and MAE values 10.32 and 8.46, respectively (Spearman correlation=0.487, p<0.001), relative to a baseline RMSE of 11.86 and MAE of 10.05, obtained by predicting the mean MDD severity score. Ultimately, our model provides a framework for detecting and monitoring vocal modulations that could also be applied to other neurological diseases.

READ LESS

Summary

A vocal modulation model with application to predicting depression severity

BubbleNet: A Cyber Security Dashboard for Visualizing Patterns

June 6, 2016

Conference Paper

Author:

Sean P. McKenna

…

Published in:

Proceeding of 2016 Eurographics Conference on Visualization (EuroVis)

Topic:

visualization

R&D area:

Cyber Security and Information Sciences

R&D group:

Cyber-Physical Systems

Summary

The field of cyber security is faced with ever-expanding amounts of data and a constant barrage of cyber attacks. Within this space, we have designed BubbleNet as a cyber security dashboard to help network analysts identify and summarize patterns within the data.

READ LESS

Summary

BubbleNet: A Cyber Security Dashboard for Visualizing Patterns

Operational assessment of keyword search on oral history

May 23, 2016

Conference Paper

Author:

Elizabeth E. Salesky

…

Published in:

10th Language Resources and Evaluation Conf., LREC 2016, 23-8 May 2016.

Topic:

human language technology

R&D area:

Cyber Security and Information Sciences

R&D group:

Artificial Intelligence Technology and Systems

Summary

This project assesses the resources necessary to make oral history searchable by means of automatic speech recognition (ASR). There are many inherent challenges in applying ASR to conversational speech: smaller training set sizes and varying demographics, among others. We assess the impact of dataset size, word error rate and term-weighted value on human search capability through an information retrieval task on Mechanical Turk. We use English oral history data collected by StoryCorps, a national organization that provides all people with the opportunity to record, share and preserve their stories, and control for a variety of demographics including age, gender, birthplace, and dialect on four different training set sizes. We show comparable search performance using a standard speech recognition system as with hand-transcribed data, which is promising for increased accessibility of conversational speech and oral history archives.

READ LESS

Summary

Operational assessment of keyword search on oral history

A fun and engaging interface for crowdsourcing named entities

May 23, 2016

Conference Paper

Author:

Kara B. Greenfield

…

Published in:

10th Language Resources and Evaluation Conf., LREC 2016, 23-28 May 2016.

Topic:

human language technology

R&D area:

Cyber Security and Information Sciences

R&D group:

Artificial Intelligence Technology and Systems

Summary

There are many current problems in natural language processing that are best solved by training algorithms on an annotated in-language, in-domain corpus. The more representative the training corpus is of the test data, the better the algorithm will perform, but also the less likely it is that such a corpus has already been annotated. Annotating corpora for natural language processing tasks is typically a time consuming and expensive process. In this paper, we provide a case study in using crowd sourcing to curate an in-domain corpus for named entity recognition, a common problem in natural language processing. In particular, we present our use of fun, engaging user interfaces as a way to entice workers to partake in our crowd sourcing task while avoiding inflating our payments in a way that would attract more mercenary workers than conscientious ones. Additionally, we provide a survey of alternate interfaces for collecting annotations of named entities and compare our approach to those systems.

READ LESS

Summary

A fun and engaging interface for crowdsourcing named entities

Enforced sparse non-negative matrix factorization

May 23, 2016

Conference Paper

Author:

Brendan E. Gavin

…

Published in:

30th IEEE Int. Parallel and Distributed Processing Symp., IPDPS 2016, 23-27 May 2016.

Topic:

supercomputing

R&D area:

Cyber Security and Information Sciences

R&D group:

Lincoln Laboratory Supercomputing Center

Summary

Non-negative matrix factorization (NMF) is a dimensionality reduction algorithm for data that can be represented as an undirected bipartite graph. It has become a common method for generating topic models of text data because it is known to produce good results, despite its relative simplicity of implementation and ease of computation. One challenge with applying the NMF to large datasets is that intermediate matrix products often become dense, thus stressing the memory and compute elements of the underlying system. In this article, we investigate a simple but powerful modification of the alternating least squares method of determining the NMF of a sparse matrix that enforces the generation of sparse intermediate and output matrices. This method enables the application of NMF to large datasets through improved memory and compute performance. Further, we demonstrate, empirically, that this method of enforcing sparsity in the NMF either preserves or improves both the accuracy of the resulting topic model and the convergence rate of the underlying algorithm.

READ LESS

Summary

Enforced sparse non-negative matrix factorization

LLMapReduce: multi-level map-reduce for high performance data analysis

May 23, 2016

Conference Paper

Author:

Chansup Byun

…

Published in:

IEEE Int. Parallel and Distributed Processing Symp., IPDPS 2016, 23-27 May 2016.

Topic:

high performance computing

R&D area:

Cyber Security and Information Sciences

R&D group:

Secure Resilient Systems and Technology

Summary

The map-reduce parallel programming model has become extremely popular in the big data community. Many big data workloads can benefit from the enhanced performance offered by supercomputers. LLMapReduce provides the familiar map-reduce parallel programming model to big data users running on a supercomputer. LLMapReduce dramatically simplifies map-reduce programming by providing simple parallel programming capability in one line of code. LLMapReduce supports all programming languages and many schedulers. LLMapReduce can work with any application without the need to modify the application. Furthermore, LLMapReduce can overcome scaling limits in the map-reduce parallel programming model via options that allow the user to switch to the more efficient single-program-multiple-data (SPMD) parallel programming model. These features allow users to reduce the computational overhead by more than 10x compared to standard map-reduce for certain applications. LLMapReduce is widely used by hundreds of users at MIT. Currently LLMapReduce works with several schedulers such as SLURM, Grid Engine and LSF.

READ LESS

Summary

LLMapReduce: multi-level map-reduce for high performance data analysis

Generating a multiple-prerequisite attack graph

May 17, 2016

Author:

Richard P. Lippmann

…

Published in:

PATENT-9344444

Topic:

attack graphs

R&D area:

Cyber Security and Information Sciences

R&D group:

Artificial Intelligence Technology and Systems

Summary

In one aspect, a method to generate an attack graph includes determining if a potential node provides a first precondition equivalent to one of preconditions provided by a group of preexisting nodes on the attack graph. The group of preexisting nodes includes a first state node, a first vulnerability instance node, a first prerequisite node, and a second state node. The method also includes, if the first precondition is equivalent to one of the preconditions provided by the group of preexisting nodes, coupling a current node to a preexisting node providing the precondition equivalent to the first precondition using a first edge and if the first precondition is not equivalent to one of the preconditions provided by the group of preexisting nodes, generating the potential node as a new node on the attack graph and coupling the new node to the current node using a second edge.

READ LESS

Summary

Generating a multiple-prerequisite attack graph

Publications

Refine Results

Collaborative Data Analysis and Discovery for Cyber Security

Summary

Summary

Channel compensation for speaker recognition using MAP adapted PLDA and denoising DNNs

Summary

Summary

The MITLL NIST LRE 2015 Language Recognition System

Summary

Summary

A vocal modulation model with application to predicting depression severity

Summary

Summary

BubbleNet: A Cyber Security Dashboard for Visualizing Patterns

Summary

Summary

Operational assessment of keyword search on oral history

Summary

Summary

A fun and engaging interface for crowdsourcing named entities

Summary

Summary

Enforced sparse non-negative matrix factorization

Summary

Summary

LLMapReduce: multi-level map-reduce for high performance data analysis

Summary

Summary

Generating a multiple-prerequisite attack graph

Summary

Summary

Showing Results