Publications
Triage framework for resource conservation in a speaker identification system
Summary
Summary
We present a novel framework for triaging (prioritizing and discarding) data to conserve resources for a speaker identification (SID) system. Our work is motivated by applications that require a SID system to process an overwhelming volume of audio data. We design a triage filter whose goal is to conserve recognizer...
Automatic language recognition via spectral and token based approaches
Summary
Summary
Automatic language recognition from speech consists of algorithms and techniques that model and classify the language being spoken. Current state-of-the-art language recognition systems fall into two broad categories: spectral- and token-sequence-based approaches. In this chapter, we describe algorithms for extracting features and models representing these types of language cues and...
Advanced language recognition using cepstra and phonotactics: MITLL system performance on the NIST 2005 Language Recognition Evaluation
Summary
Summary
This paper presents a description of the MIT Lincoln Laboratory submissions to the 2005 NIST Language Recognition Evaluation (LRE05). As was true in 2003, the 2005 submissions were combinations of core cepstral and phonotactic recognizers whose outputs were fused to generate final scores. For the 2005 evaluation, Lincoln Laboratory had...
Experiments with lattice-based PPRLM language identification
Summary
Summary
In this paper we describe experiments conducted during the development of a lattice-based PPRLM language identification system as part of the NIST 2005 language recognition evaluation campaign. In experiments following LRE05 the PPRLM-lattice sub-system presented here achieved a 30s/primary condition EER of 4.87%, making it the single best performing recognizer...
Support vector machines for speaker and language recognition
Summary
Summary
Support vector machines (SVMs) have proven to be a powerful technique for pattern classification. SVMs map inputs into a high-dimensional space and then separate classes with a hyperplane. A critical aspect of using SVMs successfully is the design of the inner product, the kernel, induced by the high dimensional mapping...
Language recognition with support vector machines
Summary
Summary
Support vector machines (SVMs) have become a popular tool for discriminative classification. Powerful theoretical and computational tools for support vector machines have enabled significant improvements in pattern classification in several areas. An exciting area of recent application of support vector machines is in speech processing. A key aspect of applying...
Analysis of multitarget detection for speaker and language recognition
Summary
Summary
The general multitarget detection (or open-set identification) task is the intersection of the more common tasks of close-set identification and open-set verification/detection. In this task, a bank of parallel detectors process an input and must decide if the input is from one of the target classes and, if so, which...
Acoustic, phonetic, and discriminative approaches to automatic language identification
Summary
Summary
Formal evaluations conducted by NIST in 1996 demonstrated that systems that used parallel banks of tokenizer-dependent language models produced the best language identification performance. Since that time, other approaches to language identification have been developed that match or surpass the performance of phone-based systems. This paper describes and evaluates three...
Approaches to language identification using Gaussian mixture models and shifted delta cepstral features
Summary
Summary
Published results indicate that automatic language identification (LID) systems that rely on multiple-language phone recognition and n-gram language modeling produce the best performance in formal LID evaluations. By contrast, Gaussian mixture model (GMM) systems, which measure acoustic characteristics, are far more efficient computationally but have tended to provide inferior levels...
Speaker indexing in large audio databases using anchor models
Summary
Summary
This paper introduces the technique of anchor modeling in the applications of speaker detection and speaker indexing. The anchor modeling algorithm is refined by pruning the number of models needed. The system is applied to the speaker detection problem where its performance is shown to fall short of the state-of-the-art...