Publications

Refine Results

(Filters Applied) Clear All

The MITLL NIST LRE 2015 Language Recognition System

Summary

In this paper we describe the most recent MIT Lincoln Laboratory language recognition system developed for the NIST 2015 Language Recognition Evaluation (LRE). The submission features a fusion of five core classifiers, with most systems developed in the context of an i-vector framework. The 2015 evaluation presented new paradigms. First, the evaluation included fixed training and open training tracks for the first time; second, language classification performance was measured across 6 language clusters using 20 language classes instead of an N-way language task; and third, performance was measured across a nominal 3-30 second range. Results are presented for the overall performance across the six language clusters for both the fixed and open training tasks. On the 6-cluster metric the Lincoln system achieved overall costs of 0.173 and 0.168 for the fixed and open tasks respectively.
READ LESS

Summary

In this paper we describe the most recent MIT Lincoln Laboratory language recognition system developed for the NIST 2015 Language Recognition Evaluation (LRE). The submission features a fusion of five core classifiers, with most systems developed in the context of an i-vector framework. The 2015 evaluation presented new paradigms. First...

READ MORE

A unified deep neural network for speaker and language recognition

Published in:
INTERSPEECH 2015: 15th Annual Conf. of the Int. Speech Communication Assoc., 6-10 September 2015.

Summary

Significant performance gains have been reported separately for speaker recognition (SR) and language recognition (LR) tasks using either DNN posteriors of sub-phonetic units or DNN feature representations, but the two techniques have not been compared on the same SR or LR task or across SR and LR tasks using the same DNN. In this work we present the application of a single DNN for both tasks using the 2013 Domain Adaptation Challenge speaker recognition (DAC13) and the NIST 2011 language recognition evaluation (LRE11) benchmarks. Using a single DNN trained on Switchboard data we demonstrate large gains in performance on both benchmarks: a 55% reduction in EER for the DAC13 out-of-domain condition and a 48% reduction in Cavg on the LRE11 30s test condition. Score fusion and feature fusion are also investigated as is the performance of the DNN technologies at short durations for SR.
READ LESS

Summary

Significant performance gains have been reported separately for speaker recognition (SR) and language recognition (LR) tasks using either DNN posteriors of sub-phonetic units or DNN feature representations, but the two techniques have not been compared on the same SR or LR task or across SR and LR tasks using the...

READ MORE

Deep neural network approaches to speaker and language recognition

Published in:
IEEE Signal Process. Lett., Vol. 22, No. 10, October 2015, pp. 1671-5.

Summary

The impressive gains in performance obtained using deep neural networks (DNNs) for automatic speech recognition (ASR) have motivated the application of DNNs to other speech technologies such as speaker recognition (SR) and language recognition (LR). Prior work has shown performance gains for separate SR and LR tasks using DNNs for direct classification or for feature extraction. In this work we present the application for single DNN for both SR and LR using the 2013 Domain Adaptation Challenge speaker recognition (DAC13) and the NIST 2011 language recognition evaluation (LRE11) benchmarks. Using a single DNN trained for ASR on Switchboard data we demonstrate large gains on performance in both benchmarks: a 55% reduction in EER for the DAC13 out-of-domain condition and a 48% reduction in Cavg on the LRE11 30 s test condition. It is also shown that further gains are possible using score or feature fusion leading to the possibility of a single i-vector extractor producing state-of-the-art SR and LR performance.
READ LESS

Summary

The impressive gains in performance obtained using deep neural networks (DNNs) for automatic speech recognition (ASR) have motivated the application of DNNs to other speech technologies such as speaker recognition (SR) and language recognition (LR). Prior work has shown performance gains for separate SR and LR tasks using DNNs for...

READ MORE

The MITLL NIST LRE 2011 language recognition system

Summary

This paper presents a description of the MIT Lincoln Laboratory (MITLL) language recognition system developed for the NIST 2011 Language Recognition Evaluation (LRE). The submitted system consisted of a fusion of four core classifiers, three based on spectral similarity and one based on tokenization. Additional system improvements were achieved following the submission deadline. In a major departure from previous evaluations, the 2011 LRE task focused on closed-set pairwise performance so as to emphasize a system's ability to distinguish confusable language pairs. Results are presented for the 24-language confusable pair task at test utterance durations of 30, 10, and 3 seconds. Results are also shown using the standard detection metrics (DET, minDCF) and it is demonstrated the previous metrics adequately cover difficult pair performance. On the 30 s 24-language confusable pair task, the submitted and post-evaluation systems achieved average costs of 0.079 and 0.070 and standard detection costs of 0.038 and 0.033.
READ LESS

Summary

This paper presents a description of the MIT Lincoln Laboratory (MITLL) language recognition system developed for the NIST 2011 Language Recognition Evaluation (LRE). The submitted system consisted of a fusion of four core classifiers, three based on spectral similarity and one based on tokenization. Additional system improvements were achieved following...

READ MORE

Language recognition via i-vectors and dimensionality reduction

Published in:
2011 INTERSPEECH, 27-31 August 2011, pp. 857-860.

Summary

In this paper, a new language identification system is presented based on the total variability approach previously developed in the field of speaker identification. Various techniques are employed to extract the most salient features in the lower dimensional i-vector space and the system developed results in excellent performance on the 2009 LRE evaluation set without the need for any post-processing or backend techniques. Additional performance gains are observed when the system is combined with other acoustic systems.
READ LESS

Summary

In this paper, a new language identification system is presented based on the total variability approach previously developed in the field of speaker identification. Various techniques are employed to extract the most salient features in the lower dimensional i-vector space and the system developed results in excellent performance on the...

READ MORE

The MIT LL 2010 speaker recognition evaluation system: scalable language-independent speaker recognition

Published in:
Proc. IEEE Int. Conf. on Acoustics, Speech, and Signal Processing, ICASSP, 22-27 May 2011, pp. 5272-5275.

Summary

Research in the speaker recognition community has continued to address methods of mitigating variational nuisances. Telephone and auxiliary-microphone recorded speech emphasize the need for a robust way of dealing with unwanted variation. The design of recent 2010 NIST-SRE Speaker Recognition Evaluation (SRE) reflects this research emphasis. In this paper, we present the MIT submission applied to the tasks of the 2010 NIST-SRE with two main goals--language-independent scalable modeling and robust nuisance mitigation. For modeling, exclusive use of inner product-based and cepstral systems produced a language-independent computationally-scalable system. For robustness, systems that captured spectral and prosodic information, modeled nuisance subspaces using multiple novel methods, and fused scores of multiple systems were implemented. The performance of the system is presented on a subset of the NIST SRE 2010 core tasks.
READ LESS

Summary

Research in the speaker recognition community has continued to address methods of mitigating variational nuisances. Telephone and auxiliary-microphone recorded speech emphasize the need for a robust way of dealing with unwanted variation. The design of recent 2010 NIST-SRE Speaker Recognition Evaluation (SRE) reflects this research emphasis. In this paper, we...

READ MORE

Towards reduced false-alarms using cohorts

Published in:
Proc. IEEE Int. Conf. on Acoustics, Speech, and Signal Processing, ICASSP, 22-27 May 2011, pp. 4512-4515.

Summary

The focus of the 2010 NIST Speaker Recognition Evaluation (SRE) was the low false alarm regime of the detection error trade-off (DET) curve. This paper presents several approaches that specifically target this issue. It begins by highlighting the main problem with operating in the low-false alarm regime. Two sets of methods to tackle this issue are presented that require a large and diverse impostor set: the first set penalizes trials whose enrollment and test utterances are not nearest neighbors of each other while the second takes an adaptive score normalization approach similar to TopNorm and ATNorm.
READ LESS

Summary

The focus of the 2010 NIST Speaker Recognition Evaluation (SRE) was the low false alarm regime of the detection error trade-off (DET) curve. This paper presents several approaches that specifically target this issue. It begins by highlighting the main problem with operating in the low-false alarm regime. Two sets of...

READ MORE

Showing Results

1-7 of 7