Publications
Tagged As
Automatic dysphonia recognition using biologically-inspired amplitude-modulation features
Summary
Summary
A dysphonia, or disorder of the mechanisms of phonation in the larynx, can create time-varying amplitude fluctuations in the voice. A model for band-dependent analysis of this amplitude modulation (AM) phenomenon in dysphonic speech is developed from a traditional communications engineering perspective. This perspective challenges current dysphonia analysis methods that...
Estimating and evaluating confidence for forensic speaker recognition
Summary
Summary
Estimating and evaluating confidence has become a key aspect of the speaker recognition problem because of the increased use of this technology in forensic applications. We discuss evaluation measures for speaker recognition and some of their properties. We then propose a framework for confidence estimation based upon scores and metainformation...
The MIT Lincoln Laboratory RT-04F diarization systems: applications to broadcast audio and telephone conversations
Summary
Summary
Audio diarization is the process of annotating an input audio channel with information that attributes (possibly overlapping) temporal regions of signal energy to their specific sources. These sources can include particular speakers, music, background noise sources, and other signal source/channel characteristics. Diarization has utility in making automatic transcripts more readable...
Channel compensation for SVM speaker recognition
Summary
Summary
One of the major remaining challenges to improving accuracy in state-of-the-art speaker recognition algorithms is reducing the impact of channel and handset variations on system performance. For Gaussian Mixture Model based speaker recognition systems, a variety of channel-adaptation techniques are known and available for adapting models between different channel conditions...
Fusing discriminative and generative methods for speaker recognition: experiments on switchboard and NFI/TNO field data
Summary
Summary
Discriminatively trained support vector machines have recently been introduced as a novel approach to speaker recognition. Support vector machines (SVMs) have a distinctly different modeling strategy in the speaker recognition problem. The standard Gaussian mixture model (GMM) approach focuses on modeling the probability density of the speaker and the background...
Speaker diarisation for broadcast news
Summary
Summary
It is often important to be able to automatically label 'who spoke when' during some audio data. This paper describes two systems for audio segmentation developed at CUED and MIT-LL and evaluates their performance using the speaker diarisation score defined in the 2003 Rich Transcription Evaluation. A new clustering procedure...
The MMSR bilingual and crosschannel corpora for speaker recognition research and evaluation
Summary
Summary
We describe efforts to create corpora to support and evaluate systems that meet the challenge of speaker recognition in the face of both channel and language variation. In addition to addressing ongoing evaluation of speaker recognition systems, these corpora are aimed at the bilingual and crosschannel dimensions. We report on...
Conversational telephone speech corpus collection for the NIST speaker recognition evaluation 2004
Summary
Summary
This paper discusses some of the factors that should be considered when designing a speech corpus collection to be used for text independent speaker recognition evaluation. The factors include telephone handset type, telephone transmission type, language, and (non-telephone) microphone type. The paper describes the design of the new corpus collection...
The mixer corpus of multilingual, multichannel speaker recognition data
Summary
Summary
This paper describes efforts to create corpora to support and evaluate systems that perform speaker recognition where channel and language may vary. Beyond the ongoing evaluation of speaker recognition systems, these corpora are aimed at the bilingual and cross channel dimensions. We report on specific data collection efforts at the...
High-level speaker verification with support vector machines
Summary
Summary
Recently, high-level features such as word idiolect, pronunciation, phone usage, prosody, etc., have been successfully used in speaker verification. The benefit of these features was demonstrated in the NIST extended data task for speaker verification; with enough conversational data, a recognition system can become familiar with a speaker and achieve...