Publications
Finding malicious cyber discussions in social media
Summary
Summary
Today's analysts manually examine social media networks to find discussions concerning planned cyber attacks, attacker techniques and tools, and potential victims. Applying modern machine learning approaches, Lincoln Laboratory has demonstrated the ability to automatically discover such discussions from Stack Exchange, Reddit, and Twitter posts written in English.
Iris biometric security challenges and possible solutions: for your eyes only? Using the iris as a key
Summary
Summary
Biometrics were originally developed for identification, such as for criminal investigations. More recently, biometrics have been also utilized for authentication. Most biometric authentication systems today match a user's biometric reading against a stored reference template generated during enrollment. If the reading and the template are sufficiently close, the authentication is...
Characterizing phonetic transformations and acoustic differences across English dialects
Summary
Summary
In this work, we propose a framework that automatically discovers dialect-specific phonetic rules. These rules characterize when certain phonetic or acoustic transformations occur across dialects. To explicitly characterize these dialect-specific rules, we adapt the conventional hidden Markov model to handle insertion and deletion transformations. The proposed framework is able to...
Analyzing and interpreting automatically learned rules across dialects
Summary
Summary
In this paper, we demonstrate how informative dialect recognition systems such as acoustic pronunciation model (APM) help speech scientists locate and analyze phonetic rules efficiently. In particular, we analyze dialect-specific characteristics automatically learned from APM across two American English dialects. We show that unsupervised rule retrieval performs similarly to supervised...
Assessing the speaker recognition performance of naive listeners using Mechanical Turk
Summary
Summary
In this paper we attempt to quantify the ability of naive listeners to perform speaker recognition in the context of the NIST evaluation task. We describe our protocol: a series of listening experiments using large numbers of naive listeners (432) on Amazon's Mechanical Turk that attempts to measure the ability...
Informative dialect recognition using context-dependent pronunciation modeling
Summary
Summary
We propose an informative dialect recognition system that learns phonetic transformation rules, and uses them to identify dialects. A hidden Markov model is used to align reference phones with dialect specific pronunciations to characterize when and how often substitutions, insertions, and deletions occur. Decision tree clustering is used to find...
USSS-MITLL 2010 human assisted speaker recognition
Summary
Summary
The United States Secret Service (USSS) teamed with MIT Lincoln Laboratory (MIT/LL) in the US National Institute of Standards and Technology's 2010 Speaker Recognition Evaluation of Human Assisted Speaker Recognition (HASR). We describe our qualitative and automatic speaker comparison processes and our fusion of these processes, which are adapted from...
Transcript-dependent speaker recognition using mixer 1 and 2
Summary
Summary
Transcript-dependent speaker-recognition experiments are performed with the Mixer 1 and 2 read-transcription corpus using the Lincoln Laboratory speaker recognition system. Our analysis shows how widely speaker-recognition performance can vary on transcript-dependent data compared to conversational data of the same durations, given enrollment data from the same spontaneous conversational speech. A...
A linguistically-informative approach to dialect recognition using dialect-discriminating context-dependent phonetic models
Summary
Summary
We propose supervised and unsupervised learning algorithms to extract dialect discriminating phonetic rules and use these rules to adapt biphones to identify dialects. Despite many challenges (e.g., sub-dialect issues and no word transcriptions), we discovered dialect discriminating biphones compatible with the linguistic literature, while outperforming a baseline monophone system by...
Large-scale analysis of formant frequency estimation variability in conversational telephone speech
Summary
Summary
We quantify how the telephone channel and regional dialect influence formant estimates extracted from Wavesurfer in spontaneous conversational speech from over 3,600 native American English speakers. To the best of our knowledge, this is the largest scale study on this topic. We found that F1 estimates are higher in cellular...