Publications

Refine Results

(Filters Applied) Clear All

Speaker verification using support vector machines and high-level features

Published in:
IEEE Trans. on Audio, Speech, and Language Process., Vol. 15, No. 7, September 2007, pp. 2085-2094.

Summary

High-level characteristics such as word usage, pronunciation, phonotactics, prosody, etc., have seen a resurgence for automatic speaker recognition over the last several years. With the availability of many conversation sides per speaker in current corpora, high-level systems now have the amount of data needed to sufficiently characterize a speaker. Although a significant amount of work has been done in finding novel high-level features, less work has been done on modeling these features. We describe a method of speaker modeling based upon support vector machines. Current high-level feature extraction produces sequences or lattices of tokens for a given conversation side. These sequences can be converted to counts and then frequencies of -gram for a given conversation side. We use support vector machine modeling of these n-gram frequencies for speaker verification. We derive a new kernel based upon linearizing a log likelihood ratio scoring system. Generalizations of this method are shown to produce excellent results on a variety of high-level features. We demonstrate that our methods produce results significantly better than standard log-likelihood ratio modeling. We also demonstrate that our system can perform well in conjunction with standard cesptral speaker recognition systems.
READ LESS

Summary

High-level characteristics such as word usage, pronunciation, phonotactics, prosody, etc., have seen a resurgence for automatic speaker recognition over the last several years. With the availability of many conversation sides per speaker in current corpora, high-level systems now have the amount of data needed to sufficiently characterize a speaker. Although...

READ MORE

pMATLAB parallel MATLAB library

Author:
Published in:
Int. J. High Perform. Comp. Appl., Vol. 21, No. 3, Fall 2007, pp. 336-359.

Summary

MATLAB has emerged as one of the languages most commonly used by scientists and engineers for technical computing, with approximately one million users worldwide. The primary benefits of MATLAB are reduced code development time via high levels of abstractions (e.g. first class multi-dimensional arrays and thousands of built in functions), interpretive, interactive programming, and powerful mathematical graphics. The compute intensive nature of technical computing means that many MATLAB users have codes that can significantly benefit from the increased performance offered by parallel computing. pMatlab provides this capability by implementing parallel global array semantics using standard operator overloading techniques. The core data structure in pMatlab is a distributed numerical array whose distribution onto multiple processors is specified with a "map" construct. Communication operations between distributed arrays are abstracted away from the user and pMatlab transparently supports redistribution between any block-cyclic-overlapped distributions up to four dimensions. pMatlab is built on top of the MatlabMPI communication library and runs on any combination of heterogeneous systems that support MATLAB, which includes Windows, Linux, MacOS X, and SunOS. This paper describes the overall design and architecture of the pMatlab implementation. Performance is validated by implementing the HPC Challenge benchmark suite and comparing pMatlab performance with the equivalent C+MPI codes. These results indicate that pMatlab can often achieve comparable performance to C+MPI, usually at one tenth the code size. Finally, we present implementation data collected from a sample of real pMatlab applications drawn from the approximately one hundred users at MIT Lincoln Laboratory. These data indicate that users are typically able to go from a serial code to an efficient pMatlab code in about 3 hours while changing less than 1% of their code.
READ LESS

Summary

MATLAB has emerged as one of the languages most commonly used by scientists and engineers for technical computing, with approximately one million users worldwide. The primary benefits of MATLAB are reduced code development time via high levels of abstractions (e.g. first class multi-dimensional arrays and thousands of built in functions)...

READ MORE

Construction of a phonotactic dialect corpus using semiautomatic annotation

Summary

In this paper, we discuss rapid, semiautomatic annotation techniques of detailed phonological phenomena for large corpora. We describe the use of these techniques for the development of a corpus of American English dialects. The resulting annotations and corpora will support both large-scale linguistic dialect analysis and automatic dialect identification. We delineate the semiautomatic annotation process that we are currently employing and, a set of experiments we ran to validate this process. From these experiments, we learned that the use of ASR techniques could significantly increase the throughput and consistency of human annotators.
READ LESS

Summary

In this paper, we discuss rapid, semiautomatic annotation techniques of detailed phonological phenomena for large corpora. We describe the use of these techniques for the development of a corpus of American English dialects. The resulting annotations and corpora will support both large-scale linguistic dialect analysis and automatic dialect identification. We...

READ MORE

A comparison of speaker clustering and speech recognition techniques for air situational awareness

Author:
Published in:
INTERSPEECH 2007, 27-31 August 2007, pp. 2421-2424.

Summary

In this paper we compare speaker clustering and speech recognition techniques to the problem of understanding patterns of air traffic control communications. For a given radio transmission, our goal is to identify the talker and to whom he/she is speaking. This information, in combination with knowledge of the roles (i.e. takeoff, approach, hand-off, taxi, etc.) of different radio frequencies within an air traffic control region could allow tracking of pilots through various stages of flight, thus providing the potential to monitor the airspace in great detail. Both techniques must contend with degraded audio channels and significant non-native accents. We report results from experiments using the nn-MATC database showing 9.3% and 32.6% clustering error for speaker clustering and ASR methods respectively.
READ LESS

Summary

In this paper we compare speaker clustering and speech recognition techniques to the problem of understanding patterns of air traffic control communications. For a given radio transmission, our goal is to identify the talker and to whom he/she is speaking. This information, in combination with knowledge of the roles (i.e...

READ MORE

A new kernel for SVM MLLR based speaker recognition

Published in:
INTERSPEECH, 27-31 August 2007.

Summary

Speaker recognition using support vector machines (SVMs) with features derived from generative models has been shown to perform well. Typically, a universal background model (UBM) is adapted to each utterance yielding a set of features that are used in an SVM. We consider the case where the UBM is a Gaussian mixture model (GMM), and maximum likelihood linear regression (MLLR) adaptation is used to adapt the means of the UBM. We examine two possible SVM feature expansions that arise in this context: the first, a GMM supervector is constructed by stacking the means of the adapted GMM, and the second consists of the elements of the MLLR transform. We examine several kernels associated with these expansions. We show that both expansions are equivalent given an appropriate choice of kernels. Experiments performed on the NIST SRE 2006 corpus clearly highlight that our choice of kernels, which are motivated by distance metrics between GMMs, outperform ad-hoc ones. We also apply SVM nuisance attribute projection (NAP) to the kernels as a form of channel compensation and show that, with a proper choice of kernel, we achieve results comparable to existing SVM based recognizers.
READ LESS

Summary

Speaker recognition using support vector machines (SVMs) with features derived from generative models has been shown to perform well. Typically, a universal background model (UBM) is adapted to each utterance yielding a set of features that are used in an SVM. We consider the case where the UBM is a...

READ MORE

Improving phonotactic language recognition with acoustic adaptation

Author:
Published in:
INTERSPEECH 2007, 27-31 August 2007, pp. 358-361.

Summary

In recent evaluations of automatic language recognition systems, phonotactic approaches have proven highly effective. However, as most of these systems rely on underlying ASR techniques to derive a phonetic tokenization, these techniques are potentially susceptible to acoustic variability from non-language sources (i.e. gender, speaker, channel, etc.). In this paper we apply techniques from ASR research to normalize and adapt HMM-based phonetic models to improve phonotactic language recognition performance. Experiments we conducted with these techniques show an EER reduction of 29% over traditional PRLM-based approaches.
READ LESS

Summary

In recent evaluations of automatic language recognition systems, phonotactic approaches have proven highly effective. However, as most of these systems rely on underlying ASR techniques to derive a phonetic tokenization, these techniques are potentially susceptible to acoustic variability from non-language sources (i.e. gender, speaker, channel, etc.). In this paper we...

READ MORE

Variable projection and unfolding in compressed sensing

Published in:
Proc. 14th IEEE/SP Workshop on Statistical Signal Processing, 26-28 August 2007, pp. 358-362.

Summary

The performance of linear programming techniques that are applied in the signal identification and reconstruction process in compressed sensing (CS) is governed by both the number of measurements taken and the number of nonzero coefficients in the discrete basis used to represent the signal. To enhance the capabilities of CS, we have developed a technique called Variable Projection and Unfolding (VPU). VPU extends the identification and reconstruction capability of linear programming techniques to signals with a much greater number of nonzero coefficients in the basis in which the signals are compressible with significantly better reconstruction error.
READ LESS

Summary

The performance of linear programming techniques that are applied in the signal identification and reconstruction process in compressed sensing (CS) is governed by both the number of measurements taken and the number of nonzero coefficients in the discrete basis used to represent the signal. To enhance the capabilities of CS...

READ MORE

Robust speaker recognition in noisy conditions

Published in:
IEEE. Trans. Speech Audio Process., Vol. 15, No. 5, July 2007, pp. 1711-1723.

Summary

This paper investigates the problem of speaker identification and verification in noisy conditions, assuming that speech signals are corrupted by environmental noise, but knowledge about the noise characteristics is not available. This research is motivated in part by the potential application of speaker recognition technologies on handheld devices or the Internet. While the technologies promise an additional biometric layer of security to protect the user, the practical implementation of such systems faces many challenges. One of these is environmental noise. Due to the mobile nature of such systems, the noise sources can be highly time-varying and potentially unknown. This raises the requirement for noise robustness in the absence of information about the noise. This paper describes a method that combines multicondition model training and missing-feature theory to model noise with unknown temporal-spectral characteristics. Multicondition training is conducted using simulated noisy data with limited noise variation, providing a coarse compensation for the noise, and missing-feature theory is applied to refine the compensation by ignoring noise variation outside the given training conditions, thereby reducing the training and testing mismatch. This paper is focused on several issues relating to the implementation of the new model for real-world applications. These include the generation of multicondition training data to model noisy speech, the combination of different training data to optimize the recognition performance, and the reduction of the model's complexity. The new algorithm was tested using two databases with simulated and realistic noisy speech data. The first database is a redevelopment of the TIMIT database by rerecording the data in the presence of various noise types, used to test the model for speaker identification with a focus on the varieties of noise. The second database is a handheld-device database collected in realistic noisy conditions, used to further validate the model for real-world speaker verification. The new model is compared to baseline systems and is found to achieve lower error rates.
READ LESS

Summary

This paper investigates the problem of speaker identification and verification in noisy conditions, assuming that speech signals are corrupted by environmental noise, but knowledge about the noise characteristics is not available. This research is motivated in part by the potential application of speaker recognition technologies on handheld devices or the...

READ MORE

PANEMOTO: network visualization of security situational awareness through passive analysis

Summary

To maintain effective security situational awareness, administrators require tools that present up-to-date information on the state of the network in the form of 'at-a-glance' displays, and that enable rapid assessment and investigation of relevant security concerns through drill-down analysis capability. In this paper, we present a passive network monitoring tool we have developed to address these important requirements, known a Panemoto (PAssive NEtwork MOnitoring TOol). We show how Panemoto enumerates, describes, and characterizes all network components, including devices and connected networks, and delivers an accurate representation of the function of devices and logical connectivity of networks. We provide examples of Panemoto's output in which the network information is presented in two distinct but related formats: as a clickable network diagram (through the use of NetViz), a commercially available graphical display environment) and as statically-linked HTML pages, viewable in any standard web browser. Together, these presentation techniques enable a more complete understanding of the security situation of the network than each does individually.
READ LESS

Summary

To maintain effective security situational awareness, administrators require tools that present up-to-date information on the state of the network in the form of 'at-a-glance' displays, and that enable rapid assessment and investigation of relevant security concerns through drill-down analysis capability. In this paper, we present a passive network monitoring tool...

READ MORE

Benchmarking the MIT LL HPCMP DHPI system

Published in:
Annual High Performance Computer Modernization Program Users Group Conf., 19-21 June 2007.

Summary

The Massachusetts Institute of Technology Lincoln Laboratory (MIT LL) High Performance Computing Modernization Program (HPCMP) Dedicated High Performance Computing Project Investment (DHPI) system was designed to address interactive algorithm development for Department of Defense (DoD) sensor processing systems. The results of the system acceptance test provide a clear quantitative picture of the capabilities of the system. The system acceptance test for MIT LL HPCMP DHPI hardware involved an array of benchmarks that exercised each of the components of the memory hierarchy, the scheduler, and the disk arrays. These benchmarks isolated the components to verify the functionality and performance of the system, and several system issues were discovered and rectified by using these benchmarks. The memory hierarchy was evaluated using the HPC Challenge benchmark suite, which is comprised of the following benchmarks: High Performance Linpack (HPL, also known as Top 500), Fast Fourier Transform (FFT), STREAM, RandomAccess, and Effective Bandwidth. The compute nodes' Random Array of Independent Disks (RAID) arrays were evaluated with the Iozone benchmark. Finally, the scheduler and the reliability of the entire system were tested using both the HPC Challenge suite and the Iozone benchmark. For example executing the HPC Challenge benchmark suite on 416 processors, the system was able to achieve 1.42 TFlops (HPL), 34.7 GFlops (FFT), 1.24 TBytes/sec (STREAM Triad), and 0.16 GUPS (RandomAccess). This paper describes the components of the MIT Lincoln Laboratory HPCMP DHPI system, including its memory hierarchy. We present the HPC Challenge benchmark suite and Iozone benchmark and describe how each of the component benchmarks stress various components of the TX-2500 system. The results of the benchmarks are discussed, and the implications they have on the performance of the system. We conclude with a presentation of the findings.
READ LESS

Summary

The Massachusetts Institute of Technology Lincoln Laboratory (MIT LL) High Performance Computing Modernization Program (HPCMP) Dedicated High Performance Computing Project Investment (DHPI) system was designed to address interactive algorithm development for Department of Defense (DoD) sensor processing systems. The results of the system acceptance test provide a clear quantitative picture...

READ MORE