Comparison of four approaches to automatic language identification of telephone speech
January 1, 1996
Journal Article
Author:
Published in:
IEEE Trans. Speech Audio Process., Vol. 4, No. 1, January 1996, pp. 31-44.
R&D Area:
Summary
We have compared the performance of four approaches for automatic language identification of speech utterances: Gaussian mixture model (GMM) classification; single-language phone recognition followed by language-dependent, interpolated n-gram language modeling (PRLM); parallel PRLM, which uses multiple single-language phone recognizers, each trained in a different language; and language dependent parallel phone recognition (PPR). These approaches which space a wide range of training requirements and levels of recognition complexity, were evaluated with the Oregon Graduate Institute Multi-Language Telephone Speech Corpus. Systems containing phone recognizers performed better than the simpler GMM classifier. The top-performing system was parallel PRLM, which exhibited an error rate of 2% for 45-s utterances and 5% for 10-s utterances in two-language, closed-set, forced-choice classification. The error rate for 11-language, closed-set, forced-choice classification was 11% for 45-s utterances and 21% for 10-s utterances.