Publications

Refine Results

(Filters Applied) Clear All

The effect of text difficulty on machine translation performance -- a pilot study with ILR-related texts in Spanish, Farsi, Arabic, Russian and Korean

Published in:
4th Int. Conf. on Language Resources and Evaluation, LREC, 26-28 May 2004.

Summary

We report on initial experiments that examine the relationship between automated measures of machine translation performance (Doddington, 2003, and Papineni et al. 2001) and the Interagency Language Roundtable (ILR) scale of language proficiency/difficulty that has been in standard use for U.S. government language training and assessment for the past several decades (Child, Clifford and Lowe 1993). The main question we ask is how technology-oriented measures of MT performance relate to the ILR difficulty levels, where we understand that a linguist with ILR proficiency level N is expected to be able to understand a document rated at level N, but to have increasing difficulty with documents at higher levels. In this paper, we find that some key aspects of MT performance track with ILR difficulty levels, primarily for MT output whose quality is good enough to be readable by human readers.
READ LESS

Summary

We report on initial experiments that examine the relationship between automated measures of machine translation performance (Doddington, 2003, and Papineni et al. 2001) and the Interagency Language Roundtable (ILR) scale of language proficiency/difficulty that has been in standard use for U.S. government language training and assessment for the past several...

READ MORE

Conversational telephone speech corpus collection for the NIST speaker recognition evaluation 2004

Published in:
Proc. Language Resource Evaluation Conf., LREC, 24-30 May 2004, pp. 587-590.

Summary

This paper discusses some of the factors that should be considered when designing a speech corpus collection to be used for text independent speaker recognition evaluation. The factors include telephone handset type, telephone transmission type, language, and (non-telephone) microphone type. The paper describes the design of the new corpus collection being undertaken by the Linguistic Data Consortium (LDC) to support the 2004 and subsequent NIST speech recognition evaluations. Some preliminary information on the resulting 2004 evaluation test set is offered.
READ LESS

Summary

This paper discusses some of the factors that should be considered when designing a speech corpus collection to be used for text independent speaker recognition evaluation. The factors include telephone handset type, telephone transmission type, language, and (non-telephone) microphone type. The paper describes the design of the new corpus collection...

READ MORE

The mixer corpus of multilingual, multichannel speaker recognition data

Published in:
Proc. Language Resource Evaluation Conf., LREC, 24-30 May 2004, pp. 627-630.

Summary

This paper describes efforts to create corpora to support and evaluate systems that perform speaker recognition where channel and language may vary. Beyond the ongoing evaluation of speaker recognition systems, these corpora are aimed at the bilingual and cross channel dimensions. We report on specific data collection efforts at the Linguistic Data Consortium and the research ongoing at the US Federal Bureau of Investigation and MIT Lincoln Laboratories. We cover the design and requirements, the collections and final properties of the corpus integrating discussions of the data preparation, research, technology development and evaluation on a grand scale.
READ LESS

Summary

This paper describes efforts to create corpora to support and evaluate systems that perform speaker recognition where channel and language may vary. Beyond the ongoing evaluation of speaker recognition systems, these corpora are aimed at the bilingual and cross channel dimensions. We report on specific data collection efforts at the...

READ MORE

High-level speaker verification with support vector machines

Published in:
Proc. IEEE Int. Conf. on Acoustics, Speech and Signal Processing, Vol. 1, ICASSP, 17-21 May 2004, pp. I-73 - I-76.

Summary

Recently, high-level features such as word idiolect, pronunciation, phone usage, prosody, etc., have been successfully used in speaker verification. The benefit of these features was demonstrated in the NIST extended data task for speaker verification; with enough conversational data, a recognition system can become familiar with a speaker and achieve excellent accuracy. Typically, high-level-feature recognition systems produce a sequence of symbols from the acoustic signal and then perform recognition using the frequency and co-occurrence of symbols. We propose the use of support vector machines for performing the speaker verification task from these symbol frequencies. Support vector machines have been applied to text classification problems with much success. A potential difficulty in applying these methods is that standard text classification methods tend to smooth frequencies which could potentially degrade speaker verification. We derive a new kernel based upon standard log likelihood ratio scoring to address limitations of text classification methods. We show that our methods achieve significant gains over standard methods for processing high-level features.
READ LESS

Summary

Recently, high-level features such as word idiolect, pronunciation, phone usage, prosody, etc., have been successfully used in speaker verification. The benefit of these features was demonstrated in the NIST extended data task for speaker verification; with enough conversational data, a recognition system can become familiar with a speaker and achieve...

READ MORE

Multisensor MELPE using parameter substitution

Published in:
Proc. IEEE Int. Conf. on Acoustics, Speech and Signal Processing, ICASSP, Vol. 1, 17-21 May 2004, pp. I-477 - I-480.

Summary

The estimation of speech parameters and the intelligibility of speech transmitted through low-rate coders, such as MELP, are severely degraded when there are high levels of acoustic noise in the speaking environment. The application of nonacoustic and nontraditional sensors, which are less sensitive to acoustic noise than the standard microphone, is being investigated as a means to address this problem. Sensors being investigated include the General Electromagnetic Motion Sensor (GEMS) and the Physiological Microphone (P-mic). As an initial effort in this direction, a multisensor MELPe coder using parameter substitution has been developed, where pitch and voicing parameters are obtained from GEMS and PMic sensors, respectively, and the remaining parameters are obtained as usual from a standard acoustic microphone. This parameter substitution technique is shown to produce significant and promising DRT intelligibility improvements over the standard 2400 bps MELPe coder in several high-noise military environments. Further work is in progress aimed at utilizing the nontraditional sensors for additional intelligibility improvements and for more effective lower rate coding in noise.
READ LESS

Summary

The estimation of speech parameters and the intelligibility of speech transmitted through low-rate coders, such as MELP, are severely degraded when there are high levels of acoustic noise in the speaking environment. The application of nonacoustic and nontraditional sensors, which are less sensitive to acoustic noise than the standard microphone...

READ MORE

A tutorial on text-independent speaker verification

Summary

This paper presents an overview of a state-of-the-art text-independent speaker verification system. First, an introduction proposes a modular scheme of the training and test phases of a speaker verification system. Then, the most commonly speech parameterization used in speaker verification, namely, cepstral analysis, is detailed. Gaussian mixture modeling, which is the speaker modeling technique used in most systems, is then explained. A few speaker modeling alternatives, namely, neural networks and support vector machines, are mentioned. Normalization of scores is then explained, as this is a very important step to deal with real-world data. The evaluation of a speaker verification system is then detailed, and the detection error trade-off (DET) curve is explained. Several extensions of speaker verification are then enumerated, including speaker tracking and segmentation by speakers. Then, some applications of speaker verification are proposed, including on-site applications, remote applications, applications relative to structuring audio information, and games. Issues concerning the forensic area are then recalled, as we believe it is very important to inform people about the actual performance and limitations of speaker verification systems. This paper concludes by giving a few research trends in speaker verification for the next couple of years.
READ LESS

Summary

This paper presents an overview of a state-of-the-art text-independent speaker verification system. First, an introduction proposes a modular scheme of the training and test phases of a speaker verification system. Then, the most commonly speech parameterization used in speaker verification, namely, cepstral analysis, is detailed. Gaussian mixture modeling, which is...

READ MORE

Analysis of multitarget detection for speaker and language recognition

Published in:
ODYSSEY 2004, 31 May-4 June 2004.

Summary

The general multitarget detection (or open-set identification) task is the intersection of the more common tasks of close-set identification and open-set verification/detection. In this task, a bank of parallel detectors process an input and must decide if the input is from one of the target classes and, if so, which one (or a small set containing the true one). In this paper, we analyze theoretically and empirically the behavior of a multitarget detector and relate the identification confusion error and the miss and false alarm detection errors in predicting performance. We show analytically that the performance of a multitarget detector can be predicted from single detector performance using speaker and language recognition data and experiments.
READ LESS

Summary

The general multitarget detection (or open-set identification) task is the intersection of the more common tasks of close-set identification and open-set verification/detection. In this task, a bank of parallel detectors process an input and must decide if the input is from one of the target classes and, if so, which...

READ MORE

Automated lip-reading for improved speech intelligibility

Published in:
Proc. of the IEEE Int. Conf. on Acoustics, Speech, and Signal Processing, ICASSP, Vol. I, 17-21 May 2004, pp. I-701 - I-704.

Summary

Various psycho-acoustical experiments have concluded that visual features strongly affect the perception of speech. This contribution is most pronounced in noisy environments where the intelligibility of audio-only speech is quickly degraded. An exploration of the effectiveness for extracted visual features such as lip height and width for improving speech intelligibility in noisy environments is provided in this paper. The intelligibility content of these extracted visual features will be investigated through an intelligibility test on an animated rendition of the video generated from the extracted visual features, as well as on the original video. These experiments demonstrate that the extracted video features do contain important aspects of intelligibility that may be utilized in augmenting speech enhancement and coding applications. Alternatively, these extracted visual features can be transmitted in a bandwidth effective way to augment speech coders.
READ LESS

Summary

Various psycho-acoustical experiments have concluded that visual features strongly affect the perception of speech. This contribution is most pronounced in noisy environments where the intelligibility of audio-only speech is quickly degraded. An exploration of the effectiveness for extracted visual features such as lip height and width for improving speech intelligibility...

READ MORE

High performance computing productivity model synthesis

Author:
Published in:
Int. J. High Perform. Comp. App., Vol. 12, No. 4, Winter 2004, pp. 505-516.

Summary

The Defense Advanced Research Projects Agency (DARPA) High Productivity Computing System (HPCS) program is developing systems that deliver increased value to users at a rate commensurate with the rate of improvement in the underlying technologies. For example, if the relevant technology was silicon, the goal of such a system would be to double in productivity (or value) every 18 months, following Moore's law. The key questions are how we define and measure productivity, and what the underlying technologies that affect productivity are. The goal of this paper is to synthesize from several different productivity models a single model that captures the main features of all the models. In addition we will start the process of putting the model on an empirical foundation by incorporating selected results from the software engineering and high performance computing (HPC) communities. An asymptotic analysis of the model is conducted to check that it makes sense in certain special cases. The model is extrapolated to a HPC context and several examples are explored, including HPC centers, HPC users, and interactive grid computing. Finally, the model hints at a profoundly different way of viewing HPC systems, where the user must be included in the equation, and innovative hardware is a key aspect to lowering the very high costs of HPC software.
READ LESS

Summary

The Defense Advanced Research Projects Agency (DARPA) High Productivity Computing System (HPCS) program is developing systems that deliver increased value to users at a rate commensurate with the rate of improvement in the underlying technologies. For example, if the relevant technology was silicon, the goal of such a system would...

READ MORE

HPC productivity: an overarching view

Author:
Published in:
Int. J. High Perform. Comp. App., Vol. 18, No. 4, Winter 2004, pp. 393-397.

Summary

The Defense Advanced Research Projects Agency (DARPA) High Productivity Computing Systems (HPCS) program is focused on providing a new generation of economically viable high productivity computing systems for national security and for the industrial user community. The value of a high performance computing (HPC) system to a user includes many factors, such as execution time on a particular problem, software development time, direct hardware costs, and indirect administrative and maintenance costs. This special issue, which focuses on HPC productivity, brings together, for the first time, a series of novel papers written by several distinguished authors who share their views on this topic. The topic of productivity in HPC is very new and the authors have been encouraged to speculate. The goal of this first paper is to present an overarching context and framework for the other papers and to define some common ideas that have emerged in considering the problem of HPC productivity. In addition, this paper defines several characteristic HPC workflows that are useful for understanding how users exploit HPC systems, and discusses the role of activity and purpose benchmarks in establishing an empirical basis for HPC productivity.
READ LESS

Summary

The Defense Advanced Research Projects Agency (DARPA) High Productivity Computing Systems (HPCS) program is focused on providing a new generation of economically viable high productivity computing systems for national security and for the industrial user community. The value of a high performance computing (HPC) system to a user includes many...

READ MORE