Publications

Refine Results

(Filters Applied) Clear All

R&D Areas

R&D Groups

Year

Items per page

By

Carl B. Quillen Clear filter

Talking Head Detection by Likelihood-Ratio Test(220.2 KB)

September 12, 2014

Conference Paper

Author:

Carl B. Quillen

…

Published in:

Second Workshop on Speech, Language, Audio in Multimedia

Topic:

machine translation

R&D area:

Cyber Security and Information Sciences

R&D group:

Artificial Intelligence Technology and Systems

Summary

Detecting accurately when a person whose face is visible in an audio-visual medium is the audible speaker is an enabling technology with a number of useful applications. The likelihood-ratio test formulation and feature signal processing employed here allow the use of high-dimensional feature sets in the audio and visual domain, and the approach appears to have good detection performance for AV segments as short as a few seconds.

READ LESS

Summary

Talking Head Detection by Likelihood-Ratio Test

Autoregressive HMM speech synthesis

March 25, 2012

Conference Paper

Author:

Carl B. Quillen

Published in:

Proc. IEEE Int. Conf. on Acoustics, Speech and Signal Processing, ICASSP, 25-30 March 2012, pp. 4021-4.

Topic:

speech modification

R&D area:

Cyber Security and Information Sciences

R&D group:

Artificial Intelligence Technology and Systems

Summary

Autoregressive HMM modeling of spectral features has been proposed as a replacement for standard HMM speech synthesis. The merits of the approach are explored, and methods for enforcing stability of the estimated predictor coefficients are presented. It appears that rather than directly estimating autoregressive HMM parameters, greater synthesis accuracy is obtained by estimating the autoregressive HMM parameters by using a more traditional HMM recognition system to compute state-level posterior probabilities that are then used to accumulate statistics to estimate predictor coefficients. The result is a simplified mathematical framework that requires no modeling of derivatives and still provides smooth synthesis without unnatural spectral discontinuities. The resulting synthesis algorithm involves no matrix solves and may be formulated causally, and appears to result in quality very similar to that of more traditional HMM synthesis approaches. This paper describes the implementation of a complete Autoregressive HMM LVCSR system and its application for synthesis, and describes the preliminary synthesis results.

READ LESS

Summary

Autoregressive HMM speech synthesis

Kalman filter based speech synthesis

March 15, 2010

Conference Paper

Author:

Carl B. Quillen

Published in:

Proc. IEEE Int. Conf. on Acoustics, Speech, and Signal Processing, ICASSP, 15 March 2010, pp. 4618-4621.

Topic:

speech modification

R&D area:

Cyber Security and Information Sciences

R&D group:

Artificial Intelligence Technology and Systems

Summary

Preliminary results are reported from a very simple speech-synthesis system based on clustered-diphone Kalman Filter based modeling of line-spectral frequency based features. Parameters were estimated using maximum-likelihood EM training, with a constraint enforced that prevented eigenvalue magnitudes in the transition matrix from exceeding 1. Frames of training data were assigned diphone unit labels by forced alignment with an HMM recognition system. The HMM cluster tree was also used for Kalman Filter unit cluster assignments. The result is a simple synthesis system that has few parameters, synthesizes intelligible speech without audible discontinuities, and that can be adapted using MLLR techniques to support synthesis of a broad panoply of speakers from a single base model with small amounts of training data. The result is interesting for embedded synthesis applications.

READ LESS

Summary

Kalman filter based speech synthesis

Nuisance attribute projection

May 1, 2007

Book Chapter

Author:

Alex Solomonoff

…

Published in:

Chapter in Speech Communication, May 2007.

Topic:

speaker recognition

R&D area:

Cyber Security and Information Sciences

R&D group:

Artificial Intelligence Technology and Systems

Summary

Cross-channel degradation is one of the significant challenges facing speaker recognition systems. We study this problem in the support vector machine (SVM) context and nuisance variable compensation in high-dimensional spaces more generally. We present an approach to nuisance variable compensation by removing nuisance attribute-related dimensions in the SVM expansion space via projections. Training to remove these dimensions is accomplished via an eigenvalue problem. The eigenvalue problem attempts to reduce multisession variation for the same speaker, reduce different channel effects, and increase "distance" between different speakers. Experiments show significant improvement in performance for the cross-channel case.

READ LESS

Summary

Nuisance attribute projection

The 2004 MIT Lincoln Laboratory speaker recognition system

March 19, 2005

Conference Paper

Author:

Douglas A. Reynolds

…

Published in:

Proc. IEEE Int. Conf. on Acoustics, Speech and Signal Processing, ICASSP, Vol. 1, 19-23 March 2005, pp. I-177 - I-180.

Topic:

speaker recognition

R&D area:

Cyber Security and Information Sciences

R&D group:

Artificial Intelligence Technology and Systems

Summary

The MIT Lincoln Laboratory submission for the 2004 NIST Speaker Recognition Evaluation (SRE) was built upon seven core systems using speaker information from short-term acoustics, pitch and duration prosodic behavior, and phoneme and word usage. These different levels of information were modeled and classified using Gaussian Mixture Models, Support Vector Machines and N-gram language models and were combined using a single layer perception fuser. The 2004 SRE used a new multi-lingual, multi-channel speech corpus that provided a challenging speaker detection task for the above systems. In this paper we describe the core systems used and provide an overview of their performance on the 2004 SRE detection tasks.

READ LESS

Summary

The 2004 MIT Lincoln Laboratory speaker recognition system

Channel compensation for SVM speaker recognition

May 31, 2004

Conference Paper

Author:

Alex Solomonoff

…

Published in:

Odyssey, The Speaker and Language Recognition Workshop, 31 May - 3 June 2004.

Topic:

speaker recognition

R&D area:

Cyber Security and Information Sciences

R&D group:

Artificial Intelligence Technology and Systems

Summary

One of the major remaining challenges to improving accuracy in state-of-the-art speaker recognition algorithms is reducing the impact of channel and handset variations on system performance. For Gaussian Mixture Model based speaker recognition systems, a variety of channel-adaptation techniques are known and available for adapting models between different channel conditions, but for the much more recent Support Vector Machine (SVM) based approaches to this problem, much less is known about the best way to handle this issue. In this paper we explore techniques that are specific to the SVM framework in order to derive fully non-linear channel compensations. The result is a system that is less sensitive to specific kinds of labeled channel variations observed in training.

READ LESS

Summary

Channel compensation for SVM speaker recognition

Beyond cepstra: exploiting high-level information in speaker recognition

December 11, 2003

Conference Paper

Author:

Douglas A. Reynolds

…

Published in:

Proc. Multimodal User Authentication Workshop, 11-12 December, 2003, pp. 223-9.

Topic:

speaker recognition

R&D area:

Cyber Security and Information Sciences

R&D group:

Artificial Intelligence Technology and Systems

Summary

Traditionally speaker recognition techniques have focused on using short-term, low-level acoustic information such as cepstra features extracted over 20-30 ms windows of speech. But speech is a complex behavior conveying more information about the speaker than merely the sounds that are characteristic of his vocal apparatus. This higher-level information includes speaker-specific prosodics, pronunciations, word usage and conversational style. In this paper, we review some of the techniques to extract and apply these sources of high-level information with results from the NIST 2003 Extended Data Task.

READ LESS

Summary

Beyond cepstra: exploiting high-level information in speaker recognition

Publications

Refine Results

By

Talking Head Detection by Likelihood-Ratio Test(220.2 KB)

Summary

Summary

Autoregressive HMM speech synthesis

Summary

Summary

Kalman filter based speech synthesis

Summary

Summary

Nuisance attribute projection

Summary

Summary

The 2004 MIT Lincoln Laboratory speaker recognition system

Summary

Summary

Channel compensation for SVM speaker recognition

Summary

Summary

Beyond cepstra: exploiting high-level information in speaker recognition

Summary

Summary

Showing Results