Publications

Refine Results

(Filters Applied) Clear All

A subband approach to time-scale expansion of complex acoustic signals

Published in:
IEEE Trans. Speech Audio Process., Vol. 3, No. 6, November 1995, pp. 515-519.

Summary

A new approach to time-scale expansion of short-duration complex acoustic signals is introduced. Using a subband signal representation, channel phases are selected to preserve a desired time-scaled temporal envelope. The phase representation is derived from locations of events that occur within filter bank outputs. A frame-based generalization of the method imposes phase consistency across consecutive synthesis frames. The method is applied to synthetic and actual complex acoustic signals consisting of closely spaced rapidly damped sine wave. Time-frequency resolution limitations are discussed.
READ LESS

Summary

A new approach to time-scale expansion of short-duration complex acoustic signals is introduced. Using a subband signal representation, channel phases are selected to preserve a desired time-scaled temporal envelope. The phase representation is derived from locations of events that occur within filter bank outputs. A frame-based generalization of the method...

READ MORE

Time-scale modification with inconsistent constraints

Published in:
Proc. 1995 Workshop on Applications of Signal Processing to Audio Acoustics, 15-18 October 1995.

Summary

A set theoretic estimation approach is introduced for timescale modification of complex acoustic signals. The method determines a signal that meets, in a least-squared error sense, desired temporal and spectral envelope constraints that are inconsistent. These constraints are generalized within the set theoretic framework to include other signal characteristics such as instantaneous frequency and group delay. The approach can enhance acoustic signals consisting of closely-spaced sequential time components, and is applicable to biological, underwater, and music sound processing.
READ LESS

Summary

A set theoretic estimation approach is introduced for timescale modification of complex acoustic signals. The method determines a signal that meets, in a least-squared error sense, desired temporal and spectral envelope constraints that are inconsistent. These constraints are generalized within the set theoretic framework to include other signal characteristics such...

READ MORE

Sine-wave amplitude coding using a mixed LSF/PARCOR representation

Published in:
Proc. 1995 IEEE Workshop on Speech Coding for Telecommunications, 20-22 Spetember 1995, pp. 77-8.

Summary

An all-pole model of the speech spectral envelope is used to code the sine-wave amplitudes in the Sinusoidal Transform Coder. While line spectral frequencies (LSFs) are currently used to represent this all-pole model, it is shown that a mixture of line spectral frequencies and partial correlation (PARCOR) coefficients can be used to reduce complexity without a loss in quantization efficiency. Objective and subjective measures demonstrate that speech quality is maintained. In addition, the use of split vector quantization is shown to substantially reduce the number of bits needed to code the all-pole model.
READ LESS

Summary

An all-pole model of the speech spectral envelope is used to code the sine-wave amplitudes in the Sinusoidal Transform Coder. While line spectral frequencies (LSFs) are currently used to represent this all-pole model, it is shown that a mixture of line spectral frequencies and partial correlation (PARCOR) coefficients can be...

READ MORE

Measuring fine structure in speech: application to speaker identification

Published in:
Proc. IEEE Int. Conf. on Acoustics, Speech and Signal Processing, ICASSP, Vol. 1, 9-12 May 1995, pp. 325-328.

Summary

The performance of systems for speaker identification (SID) can be quite good with clean speech, though much lower with degraded speech. Thus it is useful to search for new features for SID, particularly features that are robust over a degraded channel. This paper investigates features that are based on amplitude and frequency modulations of speech formants, high resolution measurement of fundamental frequency and location of "secondary pulses," measured using a high-resolution energy operator. When these features are added to traditional features using an existing SID system with a 168 speaker telephone speech database, SID performance improved by as much as 4% for male speakers and 8.2% for female speakers.
READ LESS

Summary

The performance of systems for speaker identification (SID) can be quite good with clean speech, though much lower with degraded speech. Thus it is useful to search for new features for SID, particularly features that are robust over a degraded channel. This paper investigates features that are based on amplitude...

READ MORE

The effects of telephone transmission degradations on speaker recognition performance

Published in:
Proc. IEEE Int. Conf. on Acoustics, Speech and Signal Processing, ICASSP, Vol. 1, Speech, 9-12 May 1995, pp. 329-332.

Summary

The two largest factors affecting automatic speaker identification performance are the size of the population an the degradations introduced by noisy communication, channels (e.g., telephone transmission). To examine experimentally these two factors, this paper presents text-independent speaker identification results for varying speaker population sizes up to 630 speakers for both clean, wideband speech and telephone speech. A system based on Gaussian mixture speaker identification and experiments are conducted on the TIMIT and NTIMIT databases. This is believed to be the first speaker identification experiments on the complete 630 speaker TIMIT and NTIMIT databases and the largest text-independent speaker identification task reported to date. Identification accuracies of 99.5% and 60.7% are achieved on the TIMIT and NTIMIT databases, respectively. This paper also presents experiments which examine and attempt to quantify the performance loss associated with various telephone degradations by systematically degrading the TIMIT speech in a manner consistent with measured NTIMIT degradations and measuring the performance loss at each step. It is found that the standard degradations of filtering and additive noise do not account for all of the performance gap between the TIMIT and NTIMIT data. Measurements of nonlinear microphone distortions are also...
READ LESS

Summary

The two largest factors affecting automatic speaker identification performance are the size of the population an the degradations introduced by noisy communication, channels (e.g., telephone transmission). To examine experimentally these two factors, this paper presents text-independent speaker identification results for varying speaker population sizes up to 630 speakers for both...

READ MORE

Sinusoidal coding

Published in:
Chapter 4 in Speech Coding and Synthesis, Elsevier Science Publishers, 1995, pp. 121-173.

Summary

This chapter summarizes the sinewave-based pitch extractor, and the high-order all-pole modelling techniques that provided the basis for the multirate Sinusoidal Transform Coder and its application to multi-speaker conferencing.
READ LESS

Summary

This chapter summarizes the sinewave-based pitch extractor, and the high-order all-pole modelling techniques that provided the basis for the multirate Sinusoidal Transform Coder and its application to multi-speaker conferencing.

READ MORE

Energy onset times for speaker identification

Published in:
IEEE Signal Process. Lett., Vol. 1, No. 11, November 1994, pp. 160-162.

Summary

Onset times of resonant energy pulses are measured with the high-resolution Teager operator and used as features in the Reynolds Gaussian-mixture speaker identification algorithm. Feature sets are constructed with primary pitch and secondary pulse locations derived from low and high speech formants. Preliminary testing was performed with a confusable 40-speaker subset from the NTIMIT (telephone channel) database. Speaker identification improved from 55 to 70% correct classification when the full set of new resonant energy-based features were added as an independent stream to conventional mel-cepstra.
READ LESS

Summary

Onset times of resonant energy pulses are measured with the high-resolution Teager operator and used as features in the Reynolds Gaussian-mixture speaker identification algorithm. Feature sets are constructed with primary pitch and secondary pulse locations derived from low and high speech formants. Preliminary testing was performed with a confusable 40-speaker...

READ MORE

Formant AM-FM for speaker identification

Published in:
Proc. IEEE-SP Int. Symp. on Time-Frequency and Time-Scale Analysis, 25-28 October 1994, pp. 608-611.

Summary

The performance of systems for speaker identification (SID) can be quite good with clean speech, though much lower with degraded speech. Thus it is useful to search for new features for SID, particularly features that are robust over a degraded channel. This paper investigates features that are robust over a degraded channel. This paper investigates features that are based on amplitude and frequency modulations of speech formants. Such modulations are measured using a high-resolution energy operator and related algorithms for recovering amplitude and frequency from an AM-FM signal. When these features are added to traditional features using an existing SID system with a telephone speech database, SID performance improved by as much as 15%. Energy onset time measurements that yielded improved SID performance are also discussed.
READ LESS

Summary

The performance of systems for speaker identification (SID) can be quite good with clean speech, though much lower with degraded speech. Thus it is useful to search for new features for SID, particularly features that are robust over a degraded channel. This paper investigates features that are robust over a...

READ MORE

Energy separation in signal modulations with application to speech analysis

Published in:
IEEE Trans. Signal Process., Vol. 41, No. 10, October 1993, pp. 3024-3051.

Summary

Oscillatory signals that have both an amplitude-modulation (AM) and a frequency-modulation (FM) structure are encountered in almost all communication systems. We have also used these structures recently for modeling speech resonances, being motivated by previous work on investigating fluid dynamics phenomena during speech production that provide evidence for the existence of modulations in speech signals. In this paper, we use a nonlinear differential operator that can detect modulations in AM-FM signals by estimating the product of their time-varying amplitude and frequency. This operator essentially tracks the energy needed by a source to produce the oscillatory signal. To solve the fundamental problem of estimating both the amplitude envelope and instantaneous frequency of an AM-FM signal we develop a novel approach that uses nonlinear combinations of instantaneous signal outputs from the energy operator to separate its output energy product into its amplitude modulation and frequency modulation components. The theoretical analysis is done first for continuous-time signals. Then several efficient algorithms are developed and compared for estimating the amplitude envelope and instantaneous frequency of discrete-time AM-FM signals. These energy separation algorithms are then applied to search for modulations in speech resonances, which we model using AM-FM signals to account for time-varying amplitude envelopes and instantaneous frequencies. Our experimental results provide evidence that bandpass filtered speech signals around speech formants contain amplitude and frequency modulations within a pitch period. Overall, the energy separation algorithms, due to their very low computational complexity and instantaneously-adapting nature, are very useful in detecting modulation patterns in speech and other time-varying signals.
READ LESS

Summary

Oscillatory signals that have both an amplitude-modulation (AM) and a frequency-modulation (FM) structure are encountered in almost all communication systems. We have also used these structures recently for modeling speech resonances, being motivated by previous work on investigating fluid dynamics phenomena during speech production that provide evidence for the existence...

READ MORE

Detection of transient signals using the energy operator

Published in:
Proc. IEEE Int. Conf. on Acoustics, Speech and Signal Processing, Vol. 3, ICASSP, 27-30 April 1993, pp. 145-148.

Summary

A function of the Teager-Kaiser energy operator is introduced as a method for detecting transient signals in the presence of amplitude-modulated and frequency-modulated tonal interference. This function has excellent time resolution and is robust in the presence of white noise. The output of the detection function is also independent of the interference-to-transient ratio when that ratio is large. It is demonstrated that the detection function can be applied to interference signals with multiple amplitude-modulated and frequency-modulated tonal components.
READ LESS

Summary

A function of the Teager-Kaiser energy operator is introduced as a method for detecting transient signals in the presence of amplitude-modulated and frequency-modulated tonal interference. This function has excellent time resolution and is robust in the presence of white noise. The output of the detection function is also independent of...

READ MORE