Publications
Person authentication by voice: a need for caution
Summary
Summary
Because of recent events and as members of the scientific community working in the field of speech processing, we feel compelled to publicize our views concerning the possibility of identifying or authenticating a person from his or her voice. The need for a clear and common message was indeed shown...
Integration of speaker recognition into conversational spoken dialogue systems
Summary
Summary
In this paper we examine the integration of speaker identification/verification technology into two dialogue systems developed at MIT: the Mercury air travel reservation system and the Orion task delegation system. These systems both utilize information collected from registered users that is useful in personalizing the system to specific users and...
Model compression for GMM based speaker recognition systems
Summary
Summary
For large-scale deployments of speaker verification systems models size can be an important issue for not only minimizing storage requirements but also reducing transfer time of models over networks. Model size is also critical for deployments to small, portable devices. In this paper we present a new model compression technique...
Measuring the readability of automatic speech-to-text transcripts
Summary
Summary
This paper reports initial results from a novel psycholinguistic study that measures the readability of several types of speech transcripts. We define a four-part figure of merit to measure readability: accuracy of answers to comprehension questions, reaction-time for passage reading, reaction-time for question answering and a subjective rating of passage...
Combining cross-stream and time dimensions in phonetic speaker recognition
Summary
Summary
Recent studies show that phonetic sequences from multiple languages can provide effective features for speaker recognition. So far, only pronunciation dynamics in the time dimension, i.e., n-gram modeling on each of the phone sequences, have been examined. In the JHU 2002 Summer Workshop, we explored modeling the statistical pronunciation dynamics...
Channel robust speaker verification via feature mapping
Summary
Summary
In speaker recognition applications, channel variability is a major cause of errors. Techniques in the feature, model and score domains have been applied to mitigate channel effects. In this paper we present a new feature mapping technique that maps feature vectors into a channel independent space. The feature mapping learns...
Conditional pronunciation modeling in speaker detection
Summary
Summary
In this paper, we present a conditional pronunciation modeling method for the speaker detection task that does not rely on acoustic vectors. Aiming at exploiting higher-level information carried by the speech signal, it uses time-aligned streams of phones and phonemes to model a speaker's specific Pronunciation. Our system uses phonemes...
Phonetic speaker recognition using maximum-likelihood binary-decision tree models
Summary
Summary
Recent work in phonetic speaker recognition has shown that modeling phone sequences using n-grams is a viable and effective approach to speaker recognition, primarily aiming at capturing speaker-dependent pronunciation and also word usage. This paper describes a method involving binary-tree-structured statistical models for extending the phonetic context beyond that of...
The SuperSID project : exploiting high-level information for high-accuracy speaker recognition
Summary
Summary
The area of automatic speaker recognition has been dominated by systems using only short-term, low-level acoustic information, such as cepstral features. While these systems have indeed produced very low error rates, they ignore other levels of information beyond low-level acoustics that convey speaker information. Recently published work has shown examples...
Using prosodic and conversational features for high-performance speaker recognition : report from JHU WS'02
Summary
Summary
While there has been a long tradition of research seeking to use prosodic features, especially pitch, in speaker recognition systems, results have generally been disappointing when such features are used in isolation and only modest improvements have been set when used in conjunction with traditional cepstral GMM systems. In contrast...