Publications

Refine Results

(Filters Applied) Clear All

GROK: a practical system for securing group communications

Published in:
NCA 2010, 9th IEEE Int. Symp. on Network Computing and Applications, 15 July 2010, pp. 100-107.

Summary

We have designed and implemented a general-purpose cryptographic building block, called GROK, for securing communication among groups of entities in networks composed of high-latency, low-bandwidth, intermittently connected links. During the process, we solved a number of non-trivial system problems. This paper describes these problems and our solutions, and motivates and justifies these solutions from three viewpoints: usability, efficiency, and security. The solutions described in this paper have been tempered by securing a widely-used group-oriented application, group text chat. We implemented a prototype extension to a popular text chat client called Pidgin and evaluated it in a real-world scenario. Based on our experiences, these solutions are useful to designers of group-oriented systems specifically, and secure systems in general.
READ LESS

Summary

We have designed and implemented a general-purpose cryptographic building block, called GROK, for securing communication among groups of entities in networks composed of high-latency, low-bandwidth, intermittently connected links. During the process, we solved a number of non-trivial system problems. This paper describes these problems and our solutions, and motivates and...

READ MORE

Weighted nuisance attribute projection

Published in:
Odyssey 2010, the Speaker and Language Recognition Workshop, 28 June - 1 July 2010.

Summary

Nuisance attribute projection (NAP) has become a common method for compensation of channel effects, session variation, speaker variation, and general mismatch in speaker recognition. NAP uses an orthogonal projection to remove a nuisance subspace from a larger expansion space that contains the speaker information. Training the NAP subspace is based on optimizing pairwise distances to reduce intraspeaker variability and retain interspeaker variability. In this paper, we introduce a novel form of NAP called weighted NAP (WNAP) which significantly extends the current methodology. For WNAP, we propose a training criterion that incorporates two critical extensions to NAP variable metrics and instance-weighted training. Both an eigenvector and iterative method are proposed for solving the resulting optimization problem. The effectiveness of WNAP is shown on a NIST speaker recognition evaluation task where error rates are reduced by over 20%.
READ LESS

Summary

Nuisance attribute projection (NAP) has become a common method for compensation of channel effects, session variation, speaker variation, and general mismatch in speaker recognition. NAP uses an orthogonal projection to remove a nuisance subspace from a larger expansion space that contains the speaker information. Training the NAP subspace is based...

READ MORE

Voice production mechanisms following phonosurgical treatment of early glottic cancer

Published in:
Ann. Ontol., Rhinol. Laryngol., Vol. 119, No. 1, 2010, pp. 1-9.

Summary

Although near-normal conversational voices can be achieved with the phonosurgical management of early glottic cancer, there are still acoustic and aerodynamic deficits in vocal function that must be better understood to help further optimize phonosurgical interventions. Stroboscopic assessment is inadequate for this purpose. A newly discovered color high-speed videoendoscopy (HSV) system that included time-synchronized recordings of the acoustic signal was used to perform a detailed examination of voice production mechanisms in 14 subjects. Digital image processing techniques were used to quantify glottal phonatory function and to delineate relationships between vocal fold vibratory properties and acoustic perturbation measures. [not complete]
READ LESS

Summary

Although near-normal conversational voices can be achieved with the phonosurgical management of early glottic cancer, there are still acoustic and aerodynamic deficits in vocal function that must be better understood to help further optimize phonosurgical interventions. Stroboscopic assessment is inadequate for this purpose. A newly discovered color high-speed videoendoscopy (HSV)...

READ MORE

The application of statistical relational learning to a database of criminal and terrorist activity

Published in:
SIAM Conf. on Data Mining, 29 April - 1 May 2010.

Summary

We apply statistical relational learning to a database of criminal and terrorist activity to predict attributes and event outcomes. The database stems from a collection of news articles and court records which are carefully annotated with a variety of variables, including categorical and continuous fields. Manual analysis of this data can help inform decision makers seeking to curb violent activity within a region. We use this data to build relational models from historical data to predict attributes of groups, individuals, or events. Our first example involves predicting social network roles within a group under a variety of different data conditions. Collective classification can be used to boost the accuracy under data poor conditions. Additionally, we were able to predict the outcome of hostage negotiations using models trained on previous kidnapping events. The overall framework and techniques described here are flexible enough to be used to predict a variety of variables. Such predictions could be used as input to a more complex system to recognize intent of terrorist groups or as input to inform human decision makers.
READ LESS

Summary

We apply statistical relational learning to a database of criminal and terrorist activity to predict attributes and event outcomes. The database stems from a collection of news articles and court records which are carefully annotated with a variety of variables, including categorical and continuous fields. Manual analysis of this data...

READ MORE

Data diodes in support of trustworthy cyber infrastructure

Published in:
6th Annual Cyber Security and Information Intelligence Research Workshop, Cyber Security and Information Intelligence Challenges and Strategies, CSIIRW10, 21 April 2010.

Summary

Interconnections between process control networks and enterprise networks has resulted in the proliferation of standard communication protocols in industrial control systems which exposes instrumentation, control systems, and the critical infrastructure components they operate to a variety of cyber attacks. Various standards and technologies have been proposed to protect industrial control systems against cyber attacks and to provide them with confidentiality, integrity, and availability. Among these technologies, data diodes provide protection of critical systems by the means of physically enforcing traffic direction on the network. In order to deploy data diodes effectively, it is imperative to understand the protection they provide, the protection they do not provide, their limitations, and their place in the larger security infrastructure. In this work, we briefly review the security challenges in an industrial control system, study data diodes, their functionalities and limitations, and propose a scheme for their effective deployment in trusted process control networks (TPCNs.)
READ LESS

Summary

Interconnections between process control networks and enterprise networks has resulted in the proliferation of standard communication protocols in industrial control systems which exposes instrumentation, control systems, and the critical infrastructure components they operate to a variety of cyber attacks. Various standards and technologies have been proposed to protect industrial control...

READ MORE

Detection and simulation of scenarios with hidden Markov models and event dependency graphs

Published in:
Proc. IEEE Int. Conf. on Acoustics, Speech, and Signal Processing, ICASSP, 15 March 2010, pp. 5434-5437.

Summary

The wide availability of signal processing and language tools to extract structured data from raw content has created a new opportunity for the processing of structured signals. In this work, we explore models for the simulation and recognition of scenarios - i.e., time sequences of structured data. For simulation, we construct two models - hidden Markov models (HMMs) and event dependency graphs. Combined, these two simulation methods allow the specification of dependencies in event ordering, simultaneous execution of multiple scenarios, and evolving networks of data. For scenario recognition, we consider the application of multi-grained HMMs. We explore, in detail, mismatch between training scenarios and simulated test scenarios. The methods are applied to terrorist scenario detection with a simulation coded by a subject matter expert.
READ LESS

Summary

The wide availability of signal processing and language tools to extract structured data from raw content has created a new opportunity for the processing of structured signals. In this work, we explore models for the simulation and recognition of scenarios - i.e., time sequences of structured data. For simulation, we...

READ MORE

Preserving the character of perturbations in scaled pitch contours

Published in:
Proc. IEEE Int. Conf. on Acoustics, Speech, and Signal Processing, ICASSP, 5 March 2010, pp. 417-420.

Summary

The global and fine dynamic components of a pitch contour in voice production, as in the speaking and singing voice, are important for both the meaning and character of an utterance. In speech, for example, slow pitch inflections, rapid pitch accents, and irregular regions all comprise the pitch contour. In applications where all components of a pitch contour are stretched or compressed in the same way, as for example in time-scale modification, an unnatural scaled contour may result. In this paper, we develop a framework for scaling pitch contours, motivated by the goal of maintaining naturalness in time-scale modification of voice. Specifically, we develop a multi-band algorithm to independently modify the slow trajectory and fast perturbation components of a contour for a more natural synthesis, and we present examples where pitch contours representative of speaking and singing voice are lengthened. In the speaking voice, the frequency content of flutter or irregularity is maintained, while slow pitch inflection is simply stretched or compressed. In the singing voice, rapid vibrato is preserved while slower note-to-note variation is scaled as desired.
READ LESS

Summary

The global and fine dynamic components of a pitch contour in voice production, as in the speaking and singing voice, are important for both the meaning and character of an utterance. In speech, for example, slow pitch inflections, rapid pitch accents, and irregular regions all comprise the pitch contour. In...

READ MORE

Multi-class SVM optimization using MCE training with application to topic identification

Published in:
Proc. IEEE Int. Conf. on Acoustics, Speech, and Signal Processing, ICASSP, 15 March 2010, pp. 5350-5353.

Summary

This paper presents a minimum classification error (MCE) training approach for improving the accuracy of multi-class support vector machine (SVM) classifiers. We have applied this approach to topic identification (topic ID) for human-human telephone conversations from the Fisher corpus using ASR lattice output. The new approach yields improved performance over the traditional techniques for training multi-class SVM classifiers on this task.
READ LESS

Summary

This paper presents a minimum classification error (MCE) training approach for improving the accuracy of multi-class support vector machine (SVM) classifiers. We have applied this approach to topic identification (topic ID) for human-human telephone conversations from the Fisher corpus using ASR lattice output. The new approach yields improved performance over...

READ MORE

Kalman filter based speech synthesis

Author:
Published in:
Proc. IEEE Int. Conf. on Acoustics, Speech, and Signal Processing, ICASSP, 15 March 2010, pp. 4618-4621.

Summary

Preliminary results are reported from a very simple speech-synthesis system based on clustered-diphone Kalman Filter based modeling of line-spectral frequency based features. Parameters were estimated using maximum-likelihood EM training, with a constraint enforced that prevented eigenvalue magnitudes in the transition matrix from exceeding 1. Frames of training data were assigned diphone unit labels by forced alignment with an HMM recognition system. The HMM cluster tree was also used for Kalman Filter unit cluster assignments. The result is a simple synthesis system that has few parameters, synthesizes intelligible speech without audible discontinuities, and that can be adapted using MLLR techniques to support synthesis of a broad panoply of speakers from a single base model with small amounts of training data. The result is interesting for embedded synthesis applications.
READ LESS

Summary

Preliminary results are reported from a very simple speech-synthesis system based on clustered-diphone Kalman Filter based modeling of line-spectral frequency based features. Parameters were estimated using maximum-likelihood EM training, with a constraint enforced that prevented eigenvalue magnitudes in the transition matrix from exceeding 1. Frames of training data were assigned...

READ MORE

The MITLL NIST LRE 2009 language recognition system

Published in:
Proc. IEEE Int. Conf. on Acoustics, Speech, and Signal Processing, ICASSP, 15 March 2010, pp. 4994-4997.

Summary

This paper presents a description of the MIT Lincoln Laboratory language recognition system submitted to the NIST 2009 Language Recognition Evaluation (LRE). This system consists of a fusion of three core recognizers, two based on spectral similarity and one based on tokenization. The 2009 LRE differed from previous ones in that test data included narrowband segments from worldwide Voice of America broadcasts as well as conventional recorded conversational telephone speech. Results are presented for the 23-language closed-set and open-set detection tasks at the 30, 10, and 3 second durations along with a discussion of the language-pair task. On the 30 second 23-language closed set detection task, the system achieved a 1.64 average error rate.
READ LESS

Summary

This paper presents a description of the MIT Lincoln Laboratory language recognition system submitted to the NIST 2009 Language Recognition Evaluation (LRE). This system consists of a fusion of three core recognizers, two based on spectral similarity and one based on tokenization. The 2009 LRE differed from previous ones in...

READ MORE