Publications

Refine Results

(Filters Applied) Clear All

Rapid prototyping of radar algorithms

Author:
Published in:
IEEE Sig. Proc. Mag., Vol. 26, No. 6, November 2009, pp. 158-162.

Summary

Rapid prototyping of advanced signal processing algorithms is critical to developing new radars. Signal processing engineers usually use high level languages like MATLAB, IDL, or Python to develop advanced algorithms and to determine the optimal parameters for these algorithms. Many of these algorithms have very long execution times due to computational complexity and/or very large data sets, which hinders an efficient engineering development workflow. That is, signal processing engineers must wait hours, or even days, to get the results of the current algorithm, parameters, and data set before making changes and refinements for the next iteration. In the meantime, the engineer may have thought of several more permutations that he or she wants to test.
READ LESS

Summary

Rapid prototyping of advanced signal processing algorithms is critical to developing new radars. Signal processing engineers usually use high level languages like MATLAB, IDL, or Python to develop advanced algorithms and to determine the optimal parameters for these algorithms. Many of these algorithms have very long execution times due to...

READ MORE

Sinewave parameter estimation using the fast Fan-Chirp Transform

Published in:
Proc. IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, WASPAA, 18-21 October 2009, pp. 349-352.

Summary

Sinewave analysis/synthesis has long been an important tool for audio analysis, modification and synthesis [1]. The recently introduced Fan-Chirp Transform (FChT) [2,3] has been shown to improve the fidelity of sinewave parameter estimates for a harmonic audio signal with rapid frequency modulation [4]. A fast version of the FChT [3] reduces computation but this algorithm presents two factors that affect sinewave parameter estimation. The phase of the fast FChT does not match the phase of the original continuous-time transform and this interferes with the estimation of sinewave phases. Also, the fast FChT requires an interpolation of the input signal and the choice of interpolator affects the speed of the transform and accuracy of the estimated sinewave parameters. In this paper we demonstrate how to modify the phase of the fast FChT such that it can be used to estimate sinewave phases, and we explore the use of various interpolators demonstrating the tradeoff between transform speed and sinewave parameter accuracy.
READ LESS

Summary

Sinewave analysis/synthesis has long been an important tool for audio analysis, modification and synthesis [1]. The recently introduced Fan-Chirp Transform (FChT) [2,3] has been shown to improve the fidelity of sinewave parameter estimates for a harmonic audio signal with rapid frequency modulation [4]. A fast version of the FChT [3]...

READ MORE

Towards co-channel speaker separation by 2-D demodulation of spectrograms

Published in:
WASPAA 2009, IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 18-21 October 2009, pp. 65-68.

Summary

This paper explores a two-dimensional (2-D) processing approach for co-channel speaker separation of voiced speech. We analyze localized time-frequency regions of a narrowband spectrogram using 2-D Fourier transforms and propose a 2-D amplitude modulation model based on pitch information for single and multi-speaker content in each region. Our model maps harmonically-related speech content to concentrated entities in a transformed 2-D space, thereby motivating 2-D demodulation of the spectrogram for analysis/synthesis and speaker separation. Using a priori pitch estimates of individual speakers, we show through a quantitative evaluation: 1) Utility of the model for representing speech content of a single speaker and 2) Its feasibility for speaker separation. For the separation task, we also illustrate benefits of the model's representation of pitch dynamics relative to a sinusoidal-based separation system.
READ LESS

Summary

This paper explores a two-dimensional (2-D) processing approach for co-channel speaker separation of voiced speech. We analyze localized time-frequency regions of a narrowband spectrogram using 2-D Fourier transforms and propose a 2-D amplitude modulation model based on pitch information for single and multi-speaker content in each region. Our model maps...

READ MORE

A log-frequency approach to the identification of the Wiener-Hammerstein model

Published in:
IEEE Sig. Proc. Lett., Vol. 16, No. 10, October 2009, pp. 889-892.

Summary

In this paper we present a simple closed-form solution to the Wiener-Hammerstein (W-H) identification problem. The identification process occurs in the log-frequency domain where magnitudes and phases are separable. We show that the theoretically optimal W-H identification is unique up to an amplitude, phase and delay ambiguity, and that the nonlinearity enables the separate identification of the individual linear time invariant (LTI) components in a W-H architecture.
READ LESS

Summary

In this paper we present a simple closed-form solution to the Wiener-Hammerstein (W-H) identification problem. The identification process occurs in the log-frequency domain where magnitudes and phases are separable. We show that the theoretically optimal W-H identification is unique up to an amplitude, phase and delay ambiguity, and that the...

READ MORE

2-D processing of speech for multi-pitch analysis.

Published in:
INTERSPEECH 2009, 6-10 September 2009.

Summary

This paper introduces a two-dimensional (2-D) processing approach for the analysis of multi-pitch speech sounds. Our framework invokes the short-space 2-D Fourier transform magnitude of a narrowband spectrogram, mapping harmonically related signal components to multiple concentrated entities in a new 2-D space. First, localized time-frequency regions of the spectrogram are analyzed to extract pitch candidates. These candidates are then combined across multiple regions for obtaining separate pitch estimates of each speech-signal component at a single point in time. We refer to this as multi-region analysis (MRA). By explicitly accounting for pitch dynamics within localized time segments, this separability is distinct from that which can be obtained using short-time autocorrelation methods typically employed in state-of-the-art multi-pitch tracking algorithms. We illustrate the feasibility of MRA for multi-pitch estimation on mixtures of synthetic and real speech.
READ LESS

Summary

This paper introduces a two-dimensional (2-D) processing approach for the analysis of multi-pitch speech sounds. Our framework invokes the short-space 2-D Fourier transform magnitude of a narrowband spectrogram, mapping harmonically related signal components to multiple concentrated entities in a new 2-D space. First, localized time-frequency regions of the spectrogram are...

READ MORE

A comparison of query-by-example methods for spoken term detection

Published in:
INTERSPEECH 2009, 6-10 September 2009.

Summary

In this paper we examine an alternative interface for phonetic search, namely query-by-example, that avoids OOV issues associated with both standard word-based and phonetic search methods. We develop three methods that compare query lattices derived from example audio against a standard ngrambased phonetic index and we analyze factors affecting the performance of these systems. We show that the best systems under this paradigm are able to achieve 77% precision when retrieving utterances from conversational telephone speech and returning 10 results from a single query (performance that is better than a similar dictionary-based approach) suggesting significant utility for applications requiring high precision. We also show that these systems can be further improved using relevance feedback: By incorporating four additional queries the precision of the best system can be improved by 13.7% relative. Our systems perform well despite high phone recognition error rates (> 40%) and make use of no pronunciation or letter-to-sound resources.
READ LESS

Summary

In this paper we examine an alternative interface for phonetic search, namely query-by-example, that avoids OOV issues associated with both standard word-based and phonetic search methods. We develop three methods that compare query lattices derived from example audio against a standard ngrambased phonetic index and we analyze factors affecting the...

READ MORE

A framework for discriminative SVM/GMM systems for language recognition

Published in:
INTERSPEECH 2009, 6-10 September 2009.

Summary

Language recognition with support vector machines and shifted-delta cepstral features has been an excellent performer in NIST-sponsored language evaluation for many years. A novel improvement of this method has been the introduction of hybrid SVM/GMM systems. These systems use GMM supervectors as an SVM expansion for classification. In prior work, methods for scoring SVM/GMM systems have been introduced based upon either standard SVM scoring or GMM scoring with a pushed model. Although prior work showed experimentally that GMM scoring yielded better results, no framework was available to explain the connection between SVM scoring and GMM scoring. In this paper, we show that there are interesting connections between SVM scoring and GMM scoring. We provide a framework both theoretically and experimentally that connects the two scoring techniques. This connection should provide the basis for further research in SVM discriminative training for GMM models.
READ LESS

Summary

Language recognition with support vector machines and shifted-delta cepstral features has been an excellent performer in NIST-sponsored language evaluation for many years. A novel improvement of this method has been the introduction of hybrid SVM/GMM systems. These systems use GMM supervectors as an SVM expansion for classification. In prior work...

READ MORE

Discriminative N-gram selection for dialect recognition

Summary

Dialect recognition is a challenging and multifaceted problem. Distinguishing between dialects can rely upon many tiers of interpretation of speech data - e.g., prosodic, phonetic, spectral, and word. High-accuracy automatic methods for dialect recognition typically rely upon either phonetic or spectral characteristics of the input. A challenge with spectral system, such as those based on shifted-delta cepstral coefficients, is that they achieve good performance but do not provide insight into distinctive dialect features. In this work, a novel method based upon discriminative training and phone N- grams is proposed. This approach achieves excellent classification performance, fuses well with other systems, and has interpretable dialect characteristics in the phonetic tier. The method is demonstrated on data from the LDC and prior NIST language recognition evaluations. The method is also combined with spectral methods to demonstrate state-of-the-art performance in dialect recognition.
READ LESS

Summary

Dialect recognition is a challenging and multifaceted problem. Distinguishing between dialects can rely upon many tiers of interpretation of speech data - e.g., prosodic, phonetic, spectral, and word. High-accuracy automatic methods for dialect recognition typically rely upon either phonetic or spectral characteristics of the input. A challenge with spectral system...

READ MORE

Large-scale analysis of formant frequency estimation variability in conversational telephone speech

Published in:
INTERSPEECH 2009, 6-10 September 2009.

Summary

We quantify how the telephone channel and regional dialect influence formant estimates extracted from Wavesurfer in spontaneous conversational speech from over 3,600 native American English speakers. To the best of our knowledge, this is the largest scale study on this topic. We found that F1 estimates are higher in cellular channels than those in landline, while F2 in general shows an opposite trend. We also characterized vowel shift trends in northern states in U.S.A. and compared them with the Northern city chain shift (NCCS). Our analysis is useful in forensic applications where it is important to distinguish between speaker, dialect, and channel characteristics.
READ LESS

Summary

We quantify how the telephone channel and regional dialect influence formant estimates extracted from Wavesurfer in spontaneous conversational speech from over 3,600 native American English speakers. To the best of our knowledge, this is the largest scale study on this topic. We found that F1 estimates are higher in cellular...

READ MORE

The MIT Lincoln Laboratory 2008 speaker recognition system

Summary

In recent years methods for modeling and mitigating variational nuisances have been introduced and refined. A primary emphasis in this years NIST 2008 Speaker Recognition Evaluation (SRE) was to greatly expand the use of auxiliary microphones. This offered the additional channel variations which has been a historical challenge to speaker verification systems. In this paper we present the MIT Lincoln Laboratory Speaker Recognition system applied to the task in the NIST 2008 SRE. Our approach during the evaluation was two-fold: 1) Utilize recent advances in variational nuisance modeling (latent factor analysis and nuisance attribute projection) to allow our spectral speaker verification systems to better compensate for the channel variation introduced, and 2) fuse systems targeting the different linguistic tiers of information, high and low. The performance of the system is presented when applied on a NIST 2008 SRE task. Post evaluation analysis is conducted on the sub-task when interview microphones are present.
READ LESS

Summary

In recent years methods for modeling and mitigating variational nuisances have been introduced and refined. A primary emphasis in this years NIST 2008 Speaker Recognition Evaluation (SRE) was to greatly expand the use of auxiliary microphones. This offered the additional channel variations which has been a historical challenge to speaker...

READ MORE