Publications

Refine Results

(Filters Applied) Clear All

Extending the dynamic range of RF receivers using nonlinear equalization

Summary

Systems currently being developed to operate across wide bandwidths with high sensitivity requirements are limited by the inherent dynamic range of a receiver's analog and mixed-signal components. To increase a receiver's overall linearity, we have developed a digital NonLinear EQualization (NLEQ) processor which is capable of extending a receiver's dynamic range from one to three orders of magnitude. In this paper we describe the NLEQ architecture and present measurements of its performance.
READ LESS

Summary

Systems currently being developed to operate across wide bandwidths with high sensitivity requirements are limited by the inherent dynamic range of a receiver's analog and mixed-signal components. To increase a receiver's overall linearity, we have developed a digital NonLinear EQualization (NLEQ) processor which is capable of extending a receiver's dynamic...

READ MORE

Cognitive services for the user

Published in:
Chapter 10, Cognitive Radio Technology, 2009, pp. 305-324.

Summary

Software-defined cognitive radios (CRs) use voice as a primary input/output (I/O) modality and are expected to have substantial computational resources capable of supporting advanced speech- and audio-processing applications. This chapter extends previous work on speech applications (e.g., [1]) to cognitive services that enhance military mission capability by capitalizing on automatic processes, such as speech information extraction and understanding the environment. Such capabilities go beyond interaction with the intended user of the software-defined radio (SDR) - they extend to speech and audio applications that can be applied to information that has been extracted from voice and acoustic noise gathered from other users and entities in the environment. For example, in a military environment, situational awareness and understanding could be enhanced by informing users based on processing voice and noise from both friendly and hostile forces operating in a given battle space. This chapter provides a survey of a number of speech- and audio-processing technologies and their potential applications to CR, including: - A description of the technology and its current state of practice. - An explanation of how the technology is currently being applied, or could be applied, to CR. - Descriptions and concepts of operations for how the technology can be applied to benefit users of CRs. - A description of relevant future research directions for both the speech and audio technologies and their applications to CR. A pictorial overview of many of the core technologies with some applications presented in the following sections is shown in Figure 10.1. Also shown are some overlapping components between the technologies. For example, Gaussian mixture models (GMMs) and support vector machines (SVMs) are used in both speaker and language recognition technologies [2]. These technologies and components are described in further detail in the following sections. Speech and concierge cognitive services and their corresponding applications are covered in the following sections. The services covered include speaker recognition, language identification (LID), text-to-speech (TTS) conversion, speech-to-text (STT) conversion, machine translation (MT), background noise suppression, speech coding, speaker characterization, noise management, noise characterization, and concierge services. These technologies and their potential applications to CR are discussed at varying levels of detail commensurate with their innovation and utility.
READ LESS

Summary

Software-defined cognitive radios (CRs) use voice as a primary input/output (I/O) modality and are expected to have substantial computational resources capable of supporting advanced speech- and audio-processing applications. This chapter extends previous work on speech applications (e.g., [1]) to cognitive services that enhance military mission capability by capitalizing on automatic...

READ MORE

Gaussian mixture models

Published in:
Article in Encyclopedia of Biometrics, 2009, pp. 659-63. DOI: https://doi-org.ezproxyberklee.flo.org/10.1007/978-0-387-73003-5_196

Summary

A Gaussian Mixture Model (GMM) is a parametric probability density function represented as a weighted sum of Gaussian component densities. GMMs are commonly used as a parametric model of the probability distribution of continuous measurements or features in a biometric system, such as vocal-tract related spectral features in a speaker recognition system. GMM parameters are estimated from training data using the iterative Expectation-Maximization (EM) algorithm or Maximum A Posteriori (MAP) estimation from a well-trained prior model.
READ LESS

Summary

A Gaussian Mixture Model (GMM) is a parametric probability density function represented as a weighted sum of Gaussian component densities. GMMs are commonly used as a parametric model of the probability distribution of continuous measurements or features in a biometric system, such as vocal-tract related spectral features in a speaker...

READ MORE

High-productivity software development with pMATLAB

Published in:
Comput. Sci. Eng., Vol. 11, No. 1, January/February 2009, pp. 75-79.

Summary

In this paper, we explore the ease of tackling a communication-intensive parallel computing task - namely, the 2D fast Fourier transform (FFT). We start with a simple serial Matlab code, explore in detail a ID parallel FFT, and illustrate how it can be extended to multidimensional FFTs.
READ LESS

Summary

In this paper, we explore the ease of tackling a communication-intensive parallel computing task - namely, the 2D fast Fourier transform (FFT). We start with a simple serial Matlab code, explore in detail a ID parallel FFT, and illustrate how it can be extended to multidimensional FFTs.

READ MORE

Low-resource speech translation of Urdu to English using semi-supervised part-of-speech tagging and transliteration

Author:
Published in:
SLT 2008, IEEE Spoken Language Technology Workshop 2008, 15-10 December 2008, pp. 265-268.

Summary

This paper describes the construction of ASR and MT systems for translation of speech from Urdu into English. As both Urdu pronunciation lexicons and Urdu-English bitexts are sparse, we employ several techniques that make use of semi-supervised annotation to improve ASR and MT training. Specifically, we describe 1) the construction of a semi-supervised HMM-based part-of-speech tagger that is used to train factored translation models and 2) the use of an HMM-based transliterator from which we derive a spelling-to-pronunciation model for Urdu used in ASR training. We describe experiments performed for both ASR and MT training in the context of the Urdu-to-English task of the NIST MT08 Evaluation and we compare methods making use of additional annotation with standard statistical MT and ASR baselines.
READ LESS

Summary

This paper describes the construction of ASR and MT systems for translation of speech from Urdu into English. As both Urdu pronunciation lexicons and Urdu-English bitexts are sparse, we employ several techniques that make use of semi-supervised annotation to improve ASR and MT training. Specifically, we describe 1) the construction...

READ MORE

GROK secure multi-user chat at Red Flag 2007-03

Summary

This paper describes the GROK Secure Chat experimental activity performed by MIT Lincoln Laboratory at USAF Red Flag 2007-03 exercises and its results.
READ LESS

Summary

This paper describes the GROK Secure Chat experimental activity performed by MIT Lincoln Laboratory at USAF Red Flag 2007-03 exercises and its results.

READ MORE

Efficient speech translation through confusion network decoding

Published in:
IEEE Trans. Audio Speech Lang. Proc., Vol. 16, No. 8, November 2008, pp. 1696-1705.

Summary

This paper describes advances in the use of confusion networks as interface between automatic speech recognition and machine translation. In particular, it presents a decoding algorithm for confusion networks which results as an extension of a state-of-the-art phrase-based text translation decoder. The confusion network decoder significantly improves both in efficiency and performance over previous work along this direction, and outperforms the background text translation system. Experimental results in terms of translation accuracy and decoding efficiency are reported for the task of translating plenary speeches of the European Parliament from Spanish to English and from English to Spanish.
READ LESS

Summary

This paper describes advances in the use of confusion networks as interface between automatic speech recognition and machine translation. In particular, it presents a decoding algorithm for confusion networks which results as an extension of a state-of-the-art phrase-based text translation decoder. The confusion network decoder significantly improves both in efficiency...

READ MORE

A polyphase nonlinear equalization architecture and semi-blind identification method

Published in:
42th Asilomar Conf. on Signals, Systems, and Computers, 27 October 2008, pp. 593-597.

Summary

In this paper, we present an architecture and semiblind identification method for a polyphase nonlinear equalizer (pNLEQ). Such an equalizer is useful for extending the dynamic range of time-interleaved analog-to-digital converters (ADCs). Our proposed architecture is a polyphase extension to other architectures that partition the Volterra kernel into small nonlinear filters with relatively low computational complexity. Our semi-blind identification technique addresses important practical concerns in the equalizer identification process. We describe our architecture and demonstrate its performance with measured results when applied to a National Semiconductor ADC081000.
READ LESS

Summary

In this paper, we present an architecture and semiblind identification method for a polyphase nonlinear equalizer (pNLEQ). Such an equalizer is useful for extending the dynamic range of time-interleaved analog-to-digital converters (ADCs). Our proposed architecture is a polyphase extension to other architectures that partition the Volterra kernel into small nonlinear...

READ MORE

The cube coefficient subspace architecture for nonlinear digital predistortion

Published in:
42th Asilomar Conf. on Signals, Systems, and Computers, 27 October 2008, pp. 1857-1861.

Summary

In this paper, we present the cube coefficient subspace (CCS) architecture for linearizing power amplifiers (PAs), which divides the overparametrized Volterra kernel into small, computationally efficient subkernels spanning only the portions of the full multidimensional coefficient space with the greatest impact on linearization. Using measured results from a Q-Band solid state PA, we demonstrate that the CCS predistorter architecture achieves better linearization performance than state-of-the-art memory polynomials and generalized memory polynomials.
READ LESS

Summary

In this paper, we present the cube coefficient subspace (CCS) architecture for linearizing power amplifiers (PAs), which divides the overparametrized Volterra kernel into small, computationally efficient subkernels spanning only the portions of the full multidimensional coefficient space with the greatest impact on linearization. Using measured results from a Q-Band solid...

READ MORE

Language, dialect, and speaker recognition using Gaussian mixture models on the cell processor

Published in:
Twelfth Annual High Performance Embedded Computing Workshop, HPEC 2008, 23-25 September 2008.

Summary

Automatic recognition systems are commonly used in speech processing to classify observed utterances by the speaker's identity, dialect, and language. These problems often require high processing throughput, especially in applications involving multiple concurrent incoming speech streams, such as in datacenter-level processing. Recent advances in processor technology allow multiple processors to reside within the same chip, allowing high performance per watt. Currently the Cell Broadband Engine has the leading performance-per-watt specifications in its class. Each Cell processor consists of a PowerPC Processing Element (PPE) working together with eight Synergistic Processing Elements (SPE). The SPEs have 256KB of memory (local store), which is used for storing both program and data. This paper addresses the implementation of language, dialect, and speaker recognition on the Cell architecture. Classically, the problem of performing speech-domain recognition has been approached as embarrassingly parallel, with each utterance being processed in parallel to the others. As we will discuss, efficient processing on the Cell requires a different approach, whereby computation and data for each utterance are subdivided to be handled by separate processors. We present a computational model for automatic recognition on the Cell processor that takes advantage of its architecture, while mitigating its limitations. Using the proposed design, we predict a system able to concurrently score over 220 real-time speech streams on a single Cell.
READ LESS

Summary

Automatic recognition systems are commonly used in speech processing to classify observed utterances by the speaker's identity, dialect, and language. These problems often require high processing throughput, especially in applications involving multiple concurrent incoming speech streams, such as in datacenter-level processing. Recent advances in processor technology allow multiple processors to...

READ MORE