Publications

Refine Results

(Filters Applied) Clear All

A comparison of subspace feature-domain methods for language recognition

Summary

Compensation of cepstral features for mismatch due to dissimilar train and test conditions has been critical for good performance in many speech applications. Mismatch is typically due to variability from changes in speaker, channel, gender, and environment. Common methods for compensation include RASTA, mean and variance normalization, VTLN, and feature warping. Recently, a new class of subspace methods for model compensation have become popular in language and speaker recognition--nuisance attribute projection (NAP) and factor analysis. A feature space version of latent factor analysis has been proposed. In this work, a feature space version of NAP is presented. This new approach, fNAP, is contrasted with feature domain latent factor analysis (fLFA). Both of these methods are applied to a NIST language recognition task. Results show the viability of the new fNAP method. Also, results indicate when the different methods perform best.
READ LESS

Summary

Compensation of cepstral features for mismatch due to dissimilar train and test conditions has been critical for good performance in many speech applications. Mismatch is typically due to variability from changes in speaker, channel, gender, and environment. Common methods for compensation include RASTA, mean and variance normalization, VTLN, and feature...

READ MORE

A hybrid SVM/MCE training approach for vector space topic identification of spoken audio recordings

Published in:
INTERSPEECH 2008, 22-26 September 2008, pp. 2542-2545.

Summary

The success of support vector machines (SVMs) for classification problems is often dependent on an appropriate normalization of the input feature space. This is particularly true in topic identification, where the relative contribution of the common but uninformative function words can overpower the contribution of the rare but informative content words in the SVM kernel function score if the feature space is not normalized properly. In this paper we apply the discriminative minimum classification error (MCE) training approach to the problem of learning an appropriate feature space normalization for use with an SVM classifier. Results are presented showing significant error rate reductions for an SVM-based system on a topic identification task using the Fisher corpus of audio recordings of human conversations.
READ LESS

Summary

The success of support vector machines (SVMs) for classification problems is often dependent on an appropriate normalization of the input feature space. This is particularly true in topic identification, where the relative contribution of the common but uninformative function words can overpower the contribution of the rare but informative content...

READ MORE

Dialect recognition using adapted phonetic models

Published in:
INTERSPEECH 2008, 22-26 September 2008, p. 763-766.

Summary

In this paper, we introduce a dialect recognition method that makes use of phonetic models adapted per dialect without phonetically labeled data. We show that this method can be implemented efficiently within an existing PRLM system. We compare the performance of this system with other state-of-the-art dialect recognition methods (both acoustic and token-based) on the NIST LRE 2007 English and Mandarin dialect recognition tasks. Our experimental results indicate that this system can perform better than baseline GMM and adapted PRLM systems, and also results in consistent gains of 15-23% when combined with other systems.
READ LESS

Summary

In this paper, we introduce a dialect recognition method that makes use of phonetic models adapted per dialect without phonetically labeled data. We show that this method can be implemented efficiently within an existing PRLM system. We compare the performance of this system with other state-of-the-art dialect recognition methods (both...

READ MORE

Eigen-channel compensation and discriminatively trained Gaussian mixture models for dialect and accent recognition

Published in:
Proc. INTERSPEECH 2008, 22-26 September 2008, pp. 723-726.

Summary

This paper presents a series of dialect/accent identification results for three sets of dialects with discriminatively trained Gaussian mixture models and feature compensation using eigen-channel decomposition. The classification tasks evaluated in the paper include: 1)the Chinese language classes, 2) American and Indian accented English and 3) discrimination between three Arabic dialects. The first two tasks were evaluated on the 2007 NIST LRE corpus. The Arabic discrimination task was evaluated using data derived from the LDC Arabic set collected by Appen. Analysis is performed for the English accent problem studied and an approach to open set dialect scoring is introduced. The system resulted in equal error rates at or below 10% for each of the tasks studied.
READ LESS

Summary

This paper presents a series of dialect/accent identification results for three sets of dialects with discriminatively trained Gaussian mixture models and feature compensation using eigen-channel decomposition. The classification tasks evaluated in the paper include: 1)the Chinese language classes, 2) American and Indian accented English and 3) discrimination between three Arabic...

READ MORE

The MITLL NIST LRE 2007 language recognition system

Summary

This paper presents a description of the MIT Lincoln Laboratory language recognition system submitted to the NIST 2007 Language Recognition Evaluation. This system consists of a fusion of four core recognizers, two based on tokenization and two based on spectral similarity. Results for NIST?s 14-language detection task are presented for both the closed-set and open-set tasks and for the 30, 10 and 3 second durations. On the 30 second 14-language closed set detection task, the system achieves a 1% equal error rate.
READ LESS

Summary

This paper presents a description of the MIT Lincoln Laboratory language recognition system submitted to the NIST 2007 Language Recognition Evaluation. This system consists of a fusion of four core recognizers, two based on tokenization and two based on spectral similarity. Results for NIST?s 14-language detection task are presented for...

READ MORE

Two protocols comparing human and machine phonetic discrimination performance in conversational speech

Published in:
INTERSPEECH 2008, 22-26 September 2008, pp. 1630-1633.

Summary

This paper describes two experimental protocols for direct comparison on human and machine phonetic discrimination performance in continuous speech. These protocols attempt to isolate phonetic discrimination while controlling for language and segmentation biases. Results of two human experiments are described including comparisons with automatic phonetic recognition baselines. Our experiments suggest that in conversational telephone speech, human performance on these tasks exceeds that of machines by 15%. Furthermore, in a related controlled language model control experiment, human subjects were better able to correctly predict words in conversational speech by 45%.
READ LESS

Summary

This paper describes two experimental protocols for direct comparison on human and machine phonetic discrimination performance in continuous speech. These protocols attempt to isolate phonetic discrimination while controlling for language and segmentation biases. Results of two human experiments are described including comparisons with automatic phonetic recognition baselines. Our experiments suggest...

READ MORE

Beyond frame independence: parametric modelling of time duration in speaker and language recognition

Published in:
INTERSPEECH 2008, 22-26 September 2008, pp. 767-770.

Summary

In this work, we address the question of generating accurate likelihood estimates from multi-frame observations in speaker and language recognition. Using a simple theoretical model, we extend the basic assumption of independent frames to include two refinements: a local correlation model across neighboring frames, and a global uncertainty due to train/test channel mismatch. We present an algorithm for discriminative training of the resulting duration model based on logistic regression combined with a bisection search. We show that using this model we can achieve state-of-the-art performance for the NIST LRE07 task. Finally, we show that these more accurate class likelihood estimates can be combined to solve multiple problems using Bayes' rule, so that we can expand our single parametric backend to replace all six separate back-ends used in our NIST LRE submission for both closed and open sets.
READ LESS

Summary

In this work, we address the question of generating accurate likelihood estimates from multi-frame observations in speaker and language recognition. Using a simple theoretical model, we extend the basic assumption of independent frames to include two refinements: a local correlation model across neighboring frames, and a global uncertainty due to...

READ MORE

Proficiency testing for imaging and audio enhancement: guidelines for evaluation

Published in:
Int. Assoc. of Forensic Sciences, IAFS, 21-26 July 2008.

Summary

Proficiency tests in the forensic sciences are vital in the accreditation and quality assurance process. Most commercially available proficiency testing is available for examiners in the traditional forensic disciplines, such as latent prints, drug analysis, DNA, questioned documents, etc. Each of these disciplines is identification based. There are other forensic disciplines, however, where the output of the examination is not an identification of a person or substance. Two such disciplines are audio enhancement and video/image enhancement.
READ LESS

Summary

Proficiency tests in the forensic sciences are vital in the accreditation and quality assurance process. Most commercially available proficiency testing is available for examiners in the traditional forensic disciplines, such as latent prints, drug analysis, DNA, questioned documents, etc. Each of these disciplines is identification based. There are other forensic...

READ MORE

PVTOL: providing productivity, performance, and portability to DoD signal processing applications on multicore processors

Published in:
DoD HPCMP 2008, High Performance Computing Modernization Program Users Group Conf., 14 July 2008, pp. 327-333.

Summary

PVTOL provides an object-oriented C++ API that hides the complexity of multicore architectures within a PGAS programming model, improving programmer productivity. Tasks and conduits enable data flow patterns such as pipelining and round-robining. Hierarchical maps concisely describe how to allocate hierarchical arrays across processor and memory hierarchies and provide a simple API for moving data across these hierarchies. Functors encapsulate computational kernels; new functors can be easily developed using the PVTOL API and can be fused for more efficient computation. Existing computation and communication technologies that are optimized for various architectures are used to achieve high performance. PVTOL abstracts the details of the underlying processor architectures to provide portability. We are actively developing PVTOL for Intel, PowerPC and Cell architectures and intend to add support for more computational kernels on these architectures. FPGAs are becoming popular for accelerating computation in both the high performance computing (HPC) and high performance embedded computing (HPEC) communities. Integrated processor-FPGA technologies are now available from both HPC and HPEC vendors, e.g. Cray and Mercury Computer Systems. We plan to support FPGAs as co-processors in PVTOL. Finally, automated mapping technology has been demonstrated with pMatlab. We plan to begin implementing automated mapping in PVTOL next year. Similar to PVL, as PVTOL matures and is used in more projects at Lincoln, we plan to propose concepts demonstrated in PVTOL to HPEC-SI for adoption into future versions of VSIPL++.
READ LESS

Summary

PVTOL provides an object-oriented C++ API that hides the complexity of multicore architectures within a PGAS programming model, improving programmer productivity. Tasks and conduits enable data flow patterns such as pipelining and round-robining. Hierarchical maps concisely describe how to allocate hierarchical arrays across processor and memory hierarchies and provide a...

READ MORE

Multicore programming in pMatlab using distributed arrays

Author:
Published in:
CLADE '08: Proceedings of the 6th international workshop on Challenges of large applications in distributed environments

Summary

Matlab is one of the most commonly used languages for scientific computing with approximately one million users worldwide. Many of the programs written in matlab can benefit from the increased performance offered by multicore processors and parallel computing clusters. The Lincoln pMatlab library (http://www.ll.mit.edu.ezproxyberklee.flo.org/pMatlab) allows high performance parallel programs to be written quickly using the distributed arrays programming paradigm. This talk provides an introduction to distributed arrays programming and will describe the best programming practices for using distributed arrays to produce programs that perform well on multicore processors and parallel computing clusters. These practices include understanding the concepts of parallel concurrency vs. parallel data locality
READ LESS

Summary

Matlab is one of the most commonly used languages for scientific computing with approximately one million users worldwide. Many of the programs written in matlab can benefit from the increased performance offered by multicore processors and parallel computing clusters. The Lincoln pMatlab library (http://www.ll.mit.edu.ezproxyberklee.flo.org/pMatlab) allows high performance parallel programs to...

READ MORE