Publications
Bridging the gap between linguists and technology developers: large-scale, sociolinguistic annotation for dialect and speaker recognition
Summary
Summary
Recent years have seen increased interest within the speaker recognition community in high-level features including, for example, lexical choice, idiomatic expressions or syntactic structures. The promise of speaker recognition in forensic applications drives development toward systems robust to channel differences by selecting features inherently robust to channel difference. Within the...
Retrieval and browsing of spoken content
Summary
Summary
Ever-increasing computing power and connectivity bandwidth, together with falling storage costs, are resulting in an overwhelming amount of data of various types being produced, exchanged, and stored. Consequently, information search and retrieval has emerged as a key application area. Text-based search is the most active area, with applications that range...
Adaptive short-time analysis-synthesis for speech enhancement
Summary
Summary
In this paper we propose a multiresolution short-time analysis method for speech enhancement. It is well known that fixed resolution methods such as the traditional short-time Fourier transform do not generally match the time-frequency structure of the signal being analyzed resulting in poor estimates of the speech and noise spectra...
A covariance kernel for SVM language recognition
Summary
Summary
Discriminative training for language recognition has been a key tool for improving system performance. In addition, recognition directly from shifted-delta cepstral features has proven effective. A recent successful example of this paradigm is SVM-based discrimination of languages based on GMM mean supervectors (GSVs). GSVs are created through MAP adaptation of...
A multi-class MLLR kernel for SVM speaker recognition
Summary
Summary
Speaker recognition using support vector machines (SVMs) with features derived from generative models has been shown to perform well. Typically, a universal background model (UBM) is adapted to each utterance yielding a set of features that are used in an SVM. We consider the case where the UBM is a...
Exploiting temporal change in pitch in formant estimation
Summary
Summary
This paper considers the problem of obtaining an accurate spectral representation of speech formant structure when the voicing source exhibits a high fundamental frequency. Our work is inspired by auditory perception and physiological modeling studies implicating the use of temporal changes in speech by humans. Specifically, we develop and assess...
Language recognition with discriminative keyword selection
Summary
Summary
One commonly used approach for language recognition is to convert the input speech into a sequence of tokens such as words or phones and then to use these token sequences to determine the target language. The language classification is typically performed by extracting N-gram statistics from the token sequences and...
Multisensor very low bit rate speech coding using segment quantization
Summary
Summary
We present two approaches to noise robust very low bit rate speech coding using wideband MELP analysis/synthesis. Both methods exploit multiple acoustic and non-acoustic input sensors, using our previously-presented dynamic waveform fusion algorithm to simultaneously perform waveform fusion, noise suppression, and crosschannel noise cancellation. One coder uses a 600 bps...
Improved GMM-based language recognition using constrained MLLR transforms
Summary
Summary
In this paper we describe the application of a feature-space transform based on constrained maximum likelihood linear regression for unsupervised compensation of channel and speaker variability to the language recognition problem. We show that use of such transforms can improve baseline GMM-based language recognition performance on the 2005 NIST Language...
Analytic theory of power law graphs
Summary
Summary
An analytical theory of power law graphs is presented basedon the Kronecker graph generation technique. The analysisuses Kronecker exponentials of complete bipartite graphsto formulate the sub-structure of such graphs. This allows various high level quantities (e.g. degree distribution,betweenness centrality, diameter, eigenvalues, and isoparametric ratio) to be computed directly from the...