Publications

Refine Results

(Filters Applied) Clear All

An evaluation of audio-visual person recognition on the XM2VTS corpus using the Lausanne protocols

Published in:
Proc. 32nd IEEE Int. Conf. on Acoustics, Speech, and Signal Processing, ICASSP, April 2007, pp. IV-237 - 240.

Summary

A multimodal person recognition architecture has been developed for the purpose of improving overall recognition performance and for addressing channel-specific performance shortfalls. This multimodal architecture includes the fusion of a face recognition system with the MIT/LLGMM/UBM speaker recognition architecture. This architecture exploits the complementary and redundant nature of the face and speech modalities. The resulting multimodal architecture has been evaluated on theXM2VTS corpus using the Lausanne open set verification protocols, and demonstrates excellent recognition performance. The multimodal architecture also exhibits strong recognition performance gains over the performance of the individual modalities.
READ LESS

Summary

A multimodal person recognition architecture has been developed for the purpose of improving overall recognition performance and for addressing channel-specific performance shortfalls. This multimodal architecture includes the fusion of a face recognition system with the MIT/LLGMM/UBM speaker recognition architecture. This architecture exploits the complementary and redundant nature of the face...

READ MORE

Robust speaker recognition with cross-channel data: MIT-LL results on the 2006 NIST SRE auxiliary microphone task

Published in:
Proc. 32nd IEEE Int. Conf. on Acoustics, Speech, and Signal Processing, ICASSP, April 2007, pp. IV-49 - IV-52.

Summary

One particularly difficult challenge for cross-channel speaker verification is the auxiliary microphone task introduced in the 2005 and 2006 NIST Speaker Recognition Evaluations, where training uses telephone speech and verification uses speech from multiple auxiliary microphones. This paper presents two approaches to compensate for the effects of auxiliary microphones on the speech signal. The first compensation method mitigates session effects through Latent Factor Analysis (LFA) and Nuisance Attribute Projection (NAP). The second approach operates directly on the recorded signal with noise reduction techniques. Results are presented that show a reduction in the performance gap between telephone and auxiliary microphone data.
READ LESS

Summary

One particularly difficult challenge for cross-channel speaker verification is the auxiliary microphone task introduced in the 2005 and 2006 NIST Speaker Recognition Evaluations, where training uses telephone speech and verification uses speech from multiple auxiliary microphones. This paper presents two approaches to compensate for the effects of auxiliary microphones on...

READ MORE

Multisensor dynamic waveform fusion

Published in:
Proc. 32nd Int. Conf. on Acoustics, Speech, and Signal Processing, ICASSP, April 2007, pp. IV-577 - IV-580.

Summary

Speech communication is significantly more difficult in severe acoustic background noise environments, especially when low-rate speech coders are used. Non-acoustic sensors, such as radar sensors, vibrometers, and bone-conduction microphones, offer significant potential in these situations. We extend previous work on fixed waveform fusion from multiple sensors to an optimal dynamic waveform fusion algorithm that minimizes both additive noise and signal distortion in the estimated speech signal. We show that a minimum mean squared error (MMSE) waveform matching criterion results in a generalized multichannel Wiener filter, and that this filter will simultaneously perform waveform fusion, noise suppression, and crosschannel noise cancellation. Formal intelligibility and quality testing demonstrate significant improvement from this approach.
READ LESS

Summary

Speech communication is significantly more difficult in severe acoustic background noise environments, especially when low-rate speech coders are used. Non-acoustic sensors, such as radar sensors, vibrometers, and bone-conduction microphones, offer significant potential in these situations. We extend previous work on fixed waveform fusion from multiple sensors to an optimal dynamic...

READ MORE

The MIT-LL/IBM 2006 speaker recognition system: high-performance reduced-complexity recognition

Published in:
Proc. 32nd IEEE Int. Conf. on Acoustics, Speech, and Signal Processing, ICASSP, April 2007, pp. IV-217 - IV-220.

Summary

Many powerful methods for speaker recognition have been introduced in recent years--high-level features, novel classifiers, and channel compensation methods. A common arena for evaluating these methods has been the NIST speaker recognition evaluation (SRE). In the NIST SRE from 2002-2005, a popular approach was to fuse multiple systems based upon cepstral features and different linguistic tiers of high-level features. With enough enrollment data, this approach produced dramatic error rate reductions and showed conceptually that better performance was attainable. A drawback in this approach is that many high-level systems were being run independently requiring significant computational complexity and resources. In 2006, MIT Lincoln Laboratory focused on a new system architecture which emphasized reduced complexity. This system was a carefully selected mixture of high-level techniques, new classifier methods, and novel channel compensation techniques. This new system has excellent accuracy and has substantially reduced complexity. The performance and computational aspects of the system are detailed on a NIST 2006 SRE task.
READ LESS

Summary

Many powerful methods for speaker recognition have been introduced in recent years--high-level features, novel classifiers, and channel compensation methods. A common arena for evaluating these methods has been the NIST speaker recognition evaluation (SRE). In the NIST SRE from 2002-2005, a popular approach was to fuse multiple systems based upon...

READ MORE

Triage framework for resource conservation in a speaker identification system

Published in:
Proc. 32nd IEEE Int. Conf. on Acoustics, Speech, and Signal Processing, ICASSP, April 2007, pp. IV-69 - IV-72.

Summary

We present a novel framework for triaging (prioritizing and discarding) data to conserve resources for a speaker identification (SID) system. Our work is motivated by applications that require a SID system to process an overwhelming volume of audio data. We design a triage filter whose goal is to conserve recognizer resources while preserving relevant content. We propose triage methods that use signal quality assessment tools, a scaled-down version of the main recognizer itself, and a fusion of these measures. We define a new precision-based measure of effectiveness for our triage framework. Our experimental results with the 35-speaker tactical SID corpus bear out the validity of our approach.
READ LESS

Summary

We present a novel framework for triaging (prioritizing and discarding) data to conserve resources for a speaker identification (SID) system. Our work is motivated by applications that require a SID system to process an overwhelming volume of audio data. We design a triage filter whose goal is to conserve recognizer...

READ MORE

PMatlab: parallel Matlab library for signal processing applications

Published in:
ICASSP, 32nd IEEE Int. Conf. on Acoustics Speech and Signal Processing, April 2007, pp. IV-1189 - IV-1192.

Summary

MATLAB is one of the most commonly used languages for scientific computing with approximately one million users worldwide. At MIT Lincoln Laboratory, MATLAB is used by technical staff to develop sensor processing algorithms. MATLAB'S popularity is based on availability of high-level abstractions leading to reduced code development time. Due to the compute intensive nature of scientific computing, these applications often require long running times and would benefit greatly from increased performance offered by parallel computing. pMatlab implements partitioned global address space (PGAS) support via standard operator overloading techniques. The core data structures in pMatlab are distributed arrays and maps, which simplify parallel programming by removing the need for explicit message passing. This paper presents the pMaltab design and results for the HPC Challenge benchmark suite. Additionally, two case studies of pMatlab use are described.
READ LESS

Summary

MATLAB is one of the most commonly used languages for scientific computing with approximately one million users worldwide. At MIT Lincoln Laboratory, MATLAB is used by technical staff to develop sensor processing algorithms. MATLAB'S popularity is based on availability of high-level abstractions leading to reduced code development time. Due to...

READ MORE

Coverage maximization using dynamic taint tracing

Published in:
MIT Lincoln Laboratory Report TR-1112

Summary

We present COMET, a system that automatically assembles a test suite for a C program to improve line coverage, and give initial results for a prototype implementation. COMET works dynamically, running the program under a variety of instrumentations in a feedback loop that adds new inputs to an initial corpus with each iteration. One instrumentation in particular is crucial to the success of this approach: dynamic taint tracing. Inputs are labeled as tainted at the byte level and all read/write pairs in the program are augmented to track the flow of taint between memory objects. This allows COMET to determine from which bytes of which inputs the variables in conditions derive, thereby dramatically narrowing the search over inputs necessary to expose new code. On a test set of 13 example program, COMET improves upon the level of coverage reached in random testing by an average of 23% relative, takes only about twice the time, and requires a tiny fraction of the number of inputs to do so.
READ LESS

Summary

We present COMET, a system that automatically assembles a test suite for a C program to improve line coverage, and give initial results for a prototype implementation. COMET works dynamically, running the program under a variety of instrumentations in a feedback loop that adds new inputs to an initial corpus...

READ MORE

Auditory modeling as a basis for spectral modulation analysis with application to speaker recognition

Published in:
MIT Lincoln Laboratory Report TR-1119

Summary

This report explores auditory modeling as a basis for robust automatic speaker verification. Specifically, we have developed feature-extraction front-ends that incorporate (1) time-varying, level-dependent filtering, (2) variations in analysis filterbank size,and (3) nonlinear adaptation. Our methods are motivated both by a desire to better mimic auditory processing relative to traditional front-ends (e.g., the mel-cepstrum) as well as by reported gains in automatic speech recognition robustness exploiting similar principles. Traditional mel-cepstral features in automatic speaker recognition are derived from ~20 invariant band-pass filter weights, thereby discarding temporal structure from phase. In contrast, cochlear frequency decomposition can be more precisely modeled as the output of ~3500 time-varying, level-dependent filters. Auditory signal processing is therefore more resolved in frequency than mel-cepstral analysis and also derives temporal information. Furthermore, loss of level-dependence has been suggested to reduce human speech reception in adverse acoustic environments. We were thus motivated to employ a recently proposed level-dependent compressed gammachirp filterbank in feature extraction as well as vary the number of filters or filter weights to improve frequency resolution. We are also simulating nonlinear adaptation models of inner hair cell function along the basilar membrane that presumably mimic temporal masking effects. Auditory-based front-ends are being evaluated with the Lincoln Laboratory Gaussian mixture model recognizer on the TIMIT database under clean and noisy (additive Gaussian white noise) conditions. Preliminary results of features derived from our auditory models suggest that they provide complementary information to the mel-cepstrum under clean and noisy conditions, resulting in speaker recognition performance improvements.
READ LESS

Summary

This report explores auditory modeling as a basis for robust automatic speaker verification. Specifically, we have developed feature-extraction front-ends that incorporate (1) time-varying, level-dependent filtering, (2) variations in analysis filterbank size,and (3) nonlinear adaptation. Our methods are motivated both by a desire to better mimic auditory processing relative to traditional...

READ MORE

Automatic language recognition via spectral and token based approaches

Published in:
Chapter 41 in Springer Handbook of Speech Processing and Communication, 2007, pp. 811-24.

Summary

Automatic language recognition from speech consists of algorithms and techniques that model and classify the language being spoken. Current state-of-the-art language recognition systems fall into two broad categories: spectral- and token-sequence-based approaches. In this chapter, we describe algorithms for extracting features and models representing these types of language cues and systems for making recognition decisions using one or more of these language cues. A performance assessment of these systems is also provided, in terms of both accuracy and computation considerations, using the National Institute of Science and Technology (NIST) language recognition evaluation benchmarks.
READ LESS

Summary

Automatic language recognition from speech consists of algorithms and techniques that model and classify the language being spoken. Current state-of-the-art language recognition systems fall into two broad categories: spectral- and token-sequence-based approaches. In this chapter, we describe algorithms for extracting features and models representing these types of language cues and...

READ MORE

Practical attack graph generation for network defense

Published in:
Proc. of the 22nd Annual Computer Security Applications Conf., IEEE, 11-15 December 2006, pp.121-130.

Summary

Attack graphs are a valuable tool to network defenders, illustrating paths an attacker can use to gain access to a targeted network. Defenders can then focus their efforts on patching the vulnerabilities and configuration errors that allow the attackers the greatest amount of access. We have created a new type of attack graph, the multiple-prerequisite graph, that scales nearly linearly as the size of a typical network increases. We have built a prototype system using this graph type. The prototype uses readily available source data to automatically compute network reachability, classify vulnerabilities, build the graph, and recommend actions to improve network security. We have tested the prototype on an operational network with over 250 hosts, where it helped to discover a previously unknown configuration error. It has processed complex simulated networks with over 50,000 hosts in under four minutes.
READ LESS

Summary

Attack graphs are a valuable tool to network defenders, illustrating paths an attacker can use to gain access to a targeted network. Defenders can then focus their efforts on patching the vulnerabilities and configuration errors that allow the attackers the greatest amount of access. We have created a new type...

READ MORE