Publications

Refine Results

(Filters Applied) Clear All

Time-varying autoregressive tests for multiscale speech analysis

Published in:
INTERSPEECH 2009, 10th Annual Conf. of the International Speech Communication Association, pp. 2839-2842.

Summary

In this paper we develop hypothesis tests for speech waveform nonstationarity based on time-varying autoregressive models, and demonstrate their efficacy in speech analysis tasks at both segmental and sub-segmental scales. Key to the successful synthesis of these ideas is our employment of a generalized likelihood ratio testing framework tailored to autoregressive coefficient evolutions suitable for speech. After evaluating our framework on speech-like synthetic signals, we present preliminary results for two distinct analysis tasks using speech waveform data. At the segmental level, we develop an adaptive short-time segmentation scheme and evaluate it on whispered speech recordings, while at the sub-segmental level, we address the problem of detecting the glottal flow closed phase. Results show that our hypothesis testing framework can reliably detect changes in the vocal tract parameters across multiple scales, thereby underscoring its broad applicability to speech analysis.
READ LESS

Summary

In this paper we develop hypothesis tests for speech waveform nonstationarity based on time-varying autoregressive models, and demonstrate their efficacy in speech analysis tasks at both segmental and sub-segmental scales. Key to the successful synthesis of these ideas is our employment of a generalized likelihood ratio testing framework tailored to...

READ MORE

Variability compensated support vector machines applied to speaker verification

Published in:
INTERSPEECH 2009, Proc. of the 10th Annual Conf. of the Internatinoal Speech Communication Association, 6-9 September 2009, pp. 1555-1558.

Summary

Speaker verification using SVMs has proven successful, specifically using the GSV Kernel [1] with nuisance attribute projection (NAP) [2]. Also, the recent popularity and success of joint factor analysis [3] has led to promising attempts to use speaker factors directly as SVM features [4]. NAP projection and the use of speaker factors with SVMs are methods of handling variability in SVM speaker verification: NAP by removing undesirable nuisance variability, and using the speaker factors by forcing the discrimination to be performed based on inter-speaker variability. These successes have led us to propose a new method we call variability compensated SVM (VCSVM) to handle both inter and intra-speaker variability directly in the SVM optimization. This is done by adding a regularized penalty to the optimization that biases the normal to the hyperplane to be orthogonal to the nuisance subspace or alternatively to the complement of the subspace containing the inter-speaker variability. This bias will attempt to ensure that inter-speaker variability is used in the recognition while intra-speaker variability is ignored. In this paper we present the theory and promising results on nuisance compensation.
READ LESS

Summary

Speaker verification using SVMs has proven successful, specifically using the GSV Kernel [1] with nuisance attribute projection (NAP) [2]. Also, the recent popularity and success of joint factor analysis [3] has led to promising attempts to use speaker factors directly as SVM features [4]. NAP projection and the use of...

READ MORE

Automatic registration of LIDAR and optical images of urban scenes

Published in:
CVPR 2009, IEEE Conf. on Computer Vision and Pattern Recognition, 20-25 June 2009, pp. 2639-2646.

Summary

Fusion of 3D laser radar (LIDAR) imagery and aerial optical imagery is an efficient method for constructing 3D virtual reality models. One difficult aspect of creating such models is registering the optical image with the LIDAR point cloud, which is characterized as a camera pose estimation problem. We propose a novel application of mutual information registration methods, which exploits the statistical dependency in urban scenes of optical apperance with measured LIDAR elevation. We utilize the well known downhill simplex optimization to infer camera pose parameters. We discuss three methods for measuring mutual information between LIDAR imagery and optical imagery. Utilization of OpenGL and graphics hardware in the optimization process yields registration times dramatically lower than previous methods. Using an initial registration comparable to GPS/INS accuracy, we demonstrate the utility of our algorithm with a collection of urban images and present 3D models created with the fused imagery.
READ LESS

Summary

Fusion of 3D laser radar (LIDAR) imagery and aerial optical imagery is an efficient method for constructing 3D virtual reality models. One difficult aspect of creating such models is registering the optical image with the LIDAR point cloud, which is characterized as a camera pose estimation problem. We propose a...

READ MORE

Compressed sensing arrays for frequency-sparse signal detection and geolocation

Published in:
Proc. of the 2009 DoD High Performance Computing Modernization Program Users Group Conf., HPCMP-UGC, 15 June 2009, pp. 297-301.

Summary

Compressed sensing (CS) can be used to monitor very wide bands when the received signals are sparse in some basis. We have developed a compressed sensing receiver architecture with the ability to detect, demodulate, and geolocate signals that are sparse in frequency. In this paper, we evaluate detection, reconstruction, and angle of arrival (AoA) estimation via Monte Carlo simulation and find that, using a linear 4- sensor array and undersampling by a factor of 8, we achieve near-perfect detection when the received signals occupy up to 5% of the bandwidth being monitored and have an SNR of 20 dB or higher. The signals in our band of interest include frequency-hopping signals detected due to consistent AoA. We compare CS array performance using sensor-frequency and space-frequency bases, and determine that using the sensor-frequency basis is more practical for monitoring wide bands. Though it requires that the received signals be sparse in frequency, the sensor-frequency basis still provides spatial information and is not affected by correlation between uncompressed basis vectors.
READ LESS

Summary

Compressed sensing (CS) can be used to monitor very wide bands when the received signals are sparse in some basis. We have developed a compressed sensing receiver architecture with the ability to detect, demodulate, and geolocate signals that are sparse in frequency. In this paper, we evaluate detection, reconstruction, and...

READ MORE

Polyphase nonlinear equalization of time-interleaved analog-to-digital converters

Published in:
IEEE J. Sel. Top. Sig. Process., Vol. 3, No. 3, June 2009, pp. 362-373.

Summary

As the demand for higher data rates increases, commercial analog-to-digital converters (ADCs) are more commonly being implemented with multiple on-chip converters whose outputs are time-interleaved. The distortion generated by time-interleaved ADCs is now not only a function of the nonlinear behavior of the constituent circuitry, but also mismatches associated with interleaving multiple output streams. To mitigate distortion generated by time-interleaved ADCs, we have developed a polyphase NonLinear EQualizer (pNLEQ) which is capable of simultaneously mitigating distortion generated by both the on-chip circuitry and mismatches due to time interleaving. In this paper, we describe the pNLEQ architecture and present measurements of its performance.
READ LESS

Summary

As the demand for higher data rates increases, commercial analog-to-digital converters (ADCs) are more commonly being implemented with multiple on-chip converters whose outputs are time-interleaved. The distortion generated by time-interleaved ADCs is now not only a function of the nonlinear behavior of the constituent circuitry, but also mismatches associated with...

READ MORE

Machine translation for government applications

Published in:
Lincoln Laboratory Journal, Vol. 18, No. 1, June 2009, pp. 41-53.

Summary

The idea of a mechanical process for converting one human language into another can be traced to a letter written by René Descartes in 1629, and after nearly 400 years, this vision has not been fully realized. Machine translation (MT) using digital computers has been a grand challenge for computer scientists, mathematicians, and linguists since the first international conference on MT was held at the Massachusetts Institute of Technology in 1952. Currently, Lincoln Laboratory is achieving success in a highly focused research program that specializes in developing speech translation technology for limited language resource domains and in adapting foreign-language proficiency standards for MT evaluation. Our specialized research program is situated within a general framework for multilingual speech and text processing for government applications.
READ LESS

Summary

The idea of a mechanical process for converting one human language into another can be traced to a letter written by René Descartes in 1629, and after nearly 400 years, this vision has not been fully realized. Machine translation (MT) using digital computers has been a grand challenge for computer...

READ MORE

Advocate: a distributed architecture for speech-to-speech translation

Author:
Published in:
Lincoln Laboratory Journal, Vol. 18, No. 1, June 2009, pp. 54-65.

Summary

Advocate is a set of communications application programming interfaces and service wrappers that serve as a framework for creating complex and scalable real-time software applications from component processing algorithms. Advocate can be used for a variety of distributed processing applications, but was initially designed to use existing speech processing and machine translation components in the rapid construction of large-scale speech-to-speech translation systems. Many such speech-to-speech translation applications require real-time processing, and Advocate provides this speed with low-latency communication between services.
READ LESS

Summary

Advocate is a set of communications application programming interfaces and service wrappers that serve as a framework for creating complex and scalable real-time software applications from component processing algorithms. Advocate can be used for a variety of distributed processing applications, but was initially designed to use existing speech processing and...

READ MORE

Advocate: a distributed voice-oriented computing architecture

Published in:
North American Chapter of the Association for Computational Linguistics - Human Language Technologies Conf. (NAACL HLT 2009), 31 May - 5 June 2009.

Summary

Advocate is a lightweight and easy-to-use computing architecture that supports real-time, voice-oriented computing. It is designed to allow the combination of multiple speech and language processing components to create cohesive distributed applications. It is scalable, supporting local processing of all NLP/speech components when sufficient processing resources are available to one machine, or fully distributed/networked processing over an arbitrarily large compute structure when more compute resources are needed. Advocate is designed to operate in a large distributed test-bed in which an arbitrary number of NLP/speech services interface with an arbitrary number of Advocate clients applications. In this configuration, each Advocate client application employs automatic service discovery, calling them as required.
READ LESS

Summary

Advocate is a lightweight and easy-to-use computing architecture that supports real-time, voice-oriented computing. It is designed to allow the combination of multiple speech and language processing components to create cohesive distributed applications. It is scalable, supporting local processing of all NLP/speech components when sufficient processing resources are available to one...

READ MORE

Modeling and detection techniques for counter-terror social network analysis and intent recognition

Summary

In this paper, we describe our approach and initial results on modeling, detection, and tracking of terrorist groups and their intents based on multimedia data. While research on automated information extraction from multimedia data has yielded significant progress in areas such as the extraction of entities, links, and events, less progress has been made in the development of automated tools for analyzing the results of information extraction to ?connect the dots.? Hence, our Counter-Terror Social Network Analysis and Intent Recognition (CT-SNAIR) work focuses on development of automated techniques and tools for detection and tracking of dynamically-changing terrorist networks as well as recognition of capability and potential intent. In addition to obtaining and working with real data for algorithm development and test, we have a major focus on modeling and simulation of terrorist attacks based on real information about past attacks. We describe the development and application of a new Terror Attack Description Language (TADL), which is used as a basis for modeling and simulation of terrorist attacks. Examples are shown which illustrate the use of TADL and a companion simulator based on a Hidden Markov Model (HMM) structure to generate transactions for attack scenarios drawn from real events. We also describe our techniques for generating realistic background clutter traffic to enable experiments to estimate performance in the presence of a mix of data. An important part of our effort is to produce scenarios and corpora for use in our own research, which can be shared with a community of researchers in this area. We describe our scenario and corpus development, including specific examples from the September 2004 bombing of the Australian embassy in Jakarta and a fictitious scenario which was developed in a prior project for research in social network analysis. The scenarios can be created by subject matter experts using a graphical editing tool. Given a set of time ordered transactions between actors, we employ social network analysis (SNA) algorithms as a filtering step to divide the actors into distinct communities before determining intent. This helps reduce clutter and enhances the ability to determine activities within a specific group. For modeling and simulation purposes, we generate random networks with structures and properties similar to real-world social networks. Modeling of background traffic is an important step in generating classifiers that can separate harmless activities from suspicious activity. An algorithm for recognition of simulated potential attack scenarios in clutter based on Support Vector Machine (SVM) techniques is presented. We show performance examples, including probability of detection versus probability of false alarm tradeoffs, for a range of system parameters.
READ LESS

Summary

In this paper, we describe our approach and initial results on modeling, detection, and tracking of terrorist groups and their intents based on multimedia data. While research on automated information extraction from multimedia data has yielded significant progress in areas such as the extraction of entities, links, and events, less...

READ MORE

Forensic speaker recognition: a need for caution

Summary

There has long been a desire to be able to identify a person on the basis of his or her voice. For many years, judges, lawyers, detectives, and law enforcement agencies have wanted to use forensic voice authentication to investigate a suspect or to confirm a judgment of guilt or innocence. Challenges, realities, and cautions regarding the use of speaker recognition applied to forensic-quality samples are presented.
READ LESS

Summary

There has long been a desire to be able to identify a person on the basis of his or her voice. For many years, judges, lawyers, detectives, and law enforcement agencies have wanted to use forensic voice authentication to investigate a suspect or to confirm a judgment of guilt or...

READ MORE