Publications

Refine Results

(Filters Applied) Clear All

Toward signal processing theory for graphs and non-Euclidean data

Published in:
ICASSP 2010, IEEE Int. Conf. on Acoustics, Speech, and Signal Processing, 15 March 2010, pp. 5415-5417.

Summary

Graphs are canonical examples of high-dimensional non-Euclidean data sets, and are emerging as a common data structure in many fields. While there are many algorithms to analyze such data, a signal processing theory for evaluating these techniques akin to detection and estimation in the classical Euclidean setting remains to be developed. In this paper we show the conceptual advantages gained by formulating graph analysis problems in a signal processing framework by way of a practical example: detection of a subgraph embedded in a background graph. We describe an approach based on detection theory and provide empirical results indicating that the test statistic proposed has reasonable power to detect dense subgraphs in large random graphs.
READ LESS

Summary

Graphs are canonical examples of high-dimensional non-Euclidean data sets, and are emerging as a common data structure in many fields. While there are many algorithms to analyze such data, a signal processing theory for evaluating these techniques akin to detection and estimation in the classical Euclidean setting remains to be...

READ MORE

A linguistically-informative approach to dialect recognition using dialect-discriminating context-dependent phonetic models

Published in:
Proc. IEEE Int. Conf. on Acoustics, Speech, and Signal Processing, ICASSP, 15 March 2010, pp. 5014-5017.

Summary

We propose supervised and unsupervised learning algorithms to extract dialect discriminating phonetic rules and use these rules to adapt biphones to identify dialects. Despite many challenges (e.g., sub-dialect issues and no word transcriptions), we discovered dialect discriminating biphones compatible with the linguistic literature, while outperforming a baseline monophone system by 7.5% (relative). Our proposed dialect discriminating biphone system achieves similar performance to a baseline all-biphone system despite using 25% fewer biphone models. In addition, our system complements PRLM (Phone Recognition followed by Language Modeling), verified by obtaining relative gains of 15-29% when fused with PRLM. Our work is an encouraging first step towards a linguistically-informative dialect recognition system, with potential applications in forensic phonetics, accent training, and language learning.
READ LESS

Summary

We propose supervised and unsupervised learning algorithms to extract dialect discriminating phonetic rules and use these rules to adapt biphones to identify dialects. Despite many challenges (e.g., sub-dialect issues and no word transcriptions), we discovered dialect discriminating biphones compatible with the linguistic literature, while outperforming a baseline monophone system by...

READ MORE

High-pitch formant estimation by exploiting temporal change of pitch

Published in:
Proc. IEEE Trans. on Audio, Speech, and Language Processing, Vol. 18, No. 1, January 2010, pp. 171-186.

Summary

This paper considers the problem of obtaining an accurate spectral representation of speech formant structure when the voicing source exhibits a high fundamental frequency. Our work is inspired by auditory perception and physiological studies implicating the use of pitch dynamics in speech by humans. We develop and assess signal processing schemes aimed at exploiting temporal change of pitch to address the high-pitch formant frequency estimation problem. Specifically, we propose a 2-D analysis framework using 2-D transformations of the time-frequency space. In one approach, we project changing spectral harmonics over time to a 1-D function of frequency. In a second approach, we draw upon previous work of Quatieri and Ezzat et al. [1], [2], with similarities to the auditory modeling efforts of Chi et al. [3], where localized 2-D Fourier transforms of the time-frequency space provide improved source-filter separation when pitch is changing. Our methods show quantitative improvements for synthesized vowels with stationary formant structure in comparison to traditional and homomorphic linear prediction. We also demonstrate the feasibility of applying our methods on stationary vowel regions of natural speech spoken by high-pitch females of the TIMIT corpus. Finally, we show improvements afforded by the proposed analysis framework in formant tracking on examples of stationary and time-varying formant structure.
READ LESS

Summary

This paper considers the problem of obtaining an accurate spectral representation of speech formant structure when the voicing source exhibits a high fundamental frequency. Our work is inspired by auditory perception and physiological studies implicating the use of pitch dynamics in speech by humans. We develop and assess signal processing...

READ MORE

3-d graph processor

Summary

Graph algorithms are used for numerous database applications such as analysis of financial transactions, social networking patterns, and internet data. While graph algorithms can work well with moderate size databases, processors often have difficulty providing sufficient throughput when the databases are large. This is because the processor architectures are poorly matched to the graph computational flow. For example, most modern processors utilize cache based memory in order to take advantage of highly localized memory access patterns. However, memory access patterns associated with graph processing are often random in nature and can result in high cache miss rates. In addition, graph algorithms require significant overhead computation for dealing with indices of vertices and edges.
READ LESS

Summary

Graph algorithms are used for numerous database applications such as analysis of financial transactions, social networking patterns, and internet data. While graph algorithms can work well with moderate size databases, processors often have difficulty providing sufficient throughput when the databases are large. This is because the processor architectures are poorly...

READ MORE

Query-by-example spoken term detection using phonetic posteriorgram templates

Published in:
Proc. IEEE Workshop on Automatic Speech Recognition & Understanding, ASRU, 13-17 December 2009, pp. 421-426.

Summary

This paper examines a query-by-example approach to spoken term detection in audio files. The approach is designed for low-resource situations in which limited or no in-domain training material is available and accurate word-based speech recognition capability is unavailable. Instead of using word or phone strings as search terms, the user presents the system with audio snippets of desired search terms to act as the queries. Query and test materials are represented using phonetic posteriorgrams obtained from a phonetic recognition system. Query matches in the test data are located using a modified dynamic time warping search between query templates and test utterances. Experiments using this approach are presented using data from the Fisher corpus.
READ LESS

Summary

This paper examines a query-by-example approach to spoken term detection in audio files. The approach is designed for low-resource situations in which limited or no in-domain training material is available and accurate word-based speech recognition capability is unavailable. Instead of using word or phone strings as search terms, the user...

READ MORE

ASE: authenticated statement exchange

Published in:
2010 9th IEEE Int. Symp. on Network Computing and Applications, 7 December 2009, pp. 155-161.

Summary

Applications often re-transmit the same data, such as digital certificates, during repeated communication instances. Avoiding such superfluous transmissions with caching, while complicated, may be necessary in order to operate in low-bandwidth, high-latency wireless networks or in order to reduce communication load in shared, mobile networks. This paper presents a general framework and an accompanying software library, called "Authenticated Statement Exchange" (ASE), for helping applications implement persistent caching of application specific data. ASE supports secure caching of a number of predefined data types common to secure communication protocols and allows applications to define new data types to be handled by ASE. ASE is applicable to many applications. The paper describes the use of ASE in one such application, secure group chat. In a recent real-use deployment, ASE was instrumental in allowing secure group chat to operate over low-bandwidth satellite links.
READ LESS

Summary

Applications often re-transmit the same data, such as digital certificates, during repeated communication instances. Avoiding such superfluous transmissions with caching, while complicated, may be necessary in order to operate in low-bandwidth, high-latency wireless networks or in order to reduce communication load in shared, mobile networks. This paper presents a general...

READ MORE

Modeling modern network attacks and countermeasures using attack graphs

Published in:
ACSAC 2009, Annual Computer Security Applications Conf., 7 December 2009, pp. 117-126.

Summary

By accurately measuring risk for enterprise networks, attack graphs allow network defenders to understand the most critical threats and select the most effective countermeasures. This paper describes substantial enhancements to the NetSPA attack graph system required to model additional present-day threats (zero-day exploits and client-side attacks) and countermeasures (intrusion prevention systems, proxy firewalls, personal firewalls, and host-based vulnerability scans). Point-to-point reachability algorithms and structures were extensively redesigned to support "reverse" reachability computations and personal firewalls. Host-based vulnerability scans are imported and analyzed. Analysis of an operational network with 85 hosts demonstrates that client-side attacks pose a serious threat. Experiments on larger simulated networks demonstrated that NetSPA's previous excellent scaling is maintained. Less than two minutes are required to completely analyze a four-enclave simulated network with more than 40,000 hosts protected by personal firewalls.
READ LESS

Summary

By accurately measuring risk for enterprise networks, attack graphs allow network defenders to understand the most critical threats and select the most effective countermeasures. This paper describes substantial enhancements to the NetSPA attack graph system required to model additional present-day threats (zero-day exploits and client-side attacks) and countermeasures (intrusion prevention...

READ MORE

Speaker comparison with inner product discriminant functions

Published in:
Neural Information Processing Symp., 7 December 2009.

Summary

Speaker comparison, the process of finding the speaker similarity between two speech signals, occupies a central role in a variety of applications - speaker verification, clustering, and identification. Speaker comparison can be placed in a geometric framework by casting the problem as a model comparison process. For a given speech signal, feature vectors are produced and used to adapt a Gaussian mixture model (GMM). Speaker comparison can then be viewed as the process of compensating and finding metrics on the space of adapted models. We propose a framework, inner product discriminant functions (IPDFs), which extends many common techniques for speaker comparison - support vector machines, joint factor analysis, and linear scoring. The framework uses inner products between the parameter vectors of GMM models motivated by several statistical methods. Compensation of nuisances is performed via linear transforms on GMM parameter vectors. Using the IPDF framework, we show that many current techniques are simple variations of each other. We demonstrate, on a 2006 NIST speaker recognition evaluation task, new scoring methods using IPDFs which produce excellent error rates and require significantly less computation than current techniques.
READ LESS

Summary

Speaker comparison, the process of finding the speaker similarity between two speech signals, occupies a central role in a variety of applications - speaker verification, clustering, and identification. Speaker comparison can be placed in a geometric framework by casting the problem as a model comparison process. For a given speech...

READ MORE

The MIT-LL/AFRL IWSLT-2008 MT System

Published in:
Int. Workshop on Spoken Language Translation, IWSLT, 1-2 December 2009.

Summary

This paper describes the MIT-LL/AFRL statistical MT system and the improvements that were developed during the IWSLT 2008 evaluation campaign. As part of these efforts, we experimented with a number of extensions to the standard phrase-based model that improve performance for both text and speech-based translation on Chinese and Arabic translation tasks. We discuss the architecture of the MIT-LL/AFRL MT system, improvements over our 2007 system, and experiments we ran during the IWSLT-2008 evaluation. Specifically, we focus on 1) novel segmentation models for phrase-based MT, 2) improved lattice and confusion network decoding of speech input, 3) improved Arabic morphology for MT preprocessing, and 4) system combination methods for machine translation.
READ LESS

Summary

This paper describes the MIT-LL/AFRL statistical MT system and the improvements that were developed during the IWSLT 2008 evaluation campaign. As part of these efforts, we experimented with a number of extensions to the standard phrase-based model that improve performance for both text and speech-based translation on Chinese and Arabic...

READ MORE

A multi-sensor compressed sensing receiver: performance bounds and simulated results

Published in:
43rd Asilomar Conf. on Signals, Systems, and Computers, 1-4 November 2009, pp. 1571-1575.

Summary

Multi-sensor receivers are commonly tasked with detecting, demodulating and geolocating target emitters over very wide frequency bands. Compressed sensing can be applied to persistently monitor a wide bandwidth, given that the received signal can be represented using a small number of coefficients in some basis. In this paper we present a multi-sensor compressive sensing receiver that is capable of reconstructing frequency-sparse signals using block reconstruction techniques in a sensor-frequency basis. We derive performance bounds for time-difference and angle of arrival (AoA) estimation of such a receiver, and present simulated results in which we compare AoA reconstruction performance to the bounds derived.
READ LESS

Summary

Multi-sensor receivers are commonly tasked with detecting, demodulating and geolocating target emitters over very wide frequency bands. Compressed sensing can be applied to persistently monitor a wide bandwidth, given that the received signal can be represented using a small number of coefficients in some basis. In this paper we present...

READ MORE