Publications

Refine Results

(Filters Applied) Clear All

Approaches for language identification in mismatched environments

Summary

In this paper, we consider the task of language identification in the context of mismatch conditions. Specifically, we address the issue of using unlabeled data in the domain of interest to improve the performance of a state-of-the-art system. The evaluation is performed on a 9-language set that includes data in both conversational telephone speech and narrowband broadcast speech. Multiple experiments are conducted to assess the performance of the system in this condition and a number of alternatives to ameliorate the drop in performance. The best system evaluated is based on deep neural network (DNN) bottleneck features using i-vectors utilizing a combination of all the approaches proposed in this work. The resulting system improved baseline DNN system performance by 30%.
READ LESS

Summary

In this paper, we consider the task of language identification in the context of mismatch conditions. Specifically, we address the issue of using unlabeled data in the domain of interest to improve the performance of a state-of-the-art system. The evaluation is performed on a 9-language set that includes data in...

READ MORE

I-vector speaker and language recognition system on Android

Published in:
HPEC 2016: IEEE Conf. on High Performance Extreme Computing, 13-15 September 2016.

Summary

I-Vector based speaker and language identification provides state of the art performance. However, this comes as a more computationally complex solution, which can often lead to challenges in resource-limited devices, such as phones or tablets. We present the implementation of an I-Vector speaker and language recognition system on the Android platform in the form of a fully functional application that allows speaker enrollment and language/speaker scoring within mobile contexts. We include a detailed account of the challenges to port the system and its dependencies, which were necessary to optimize matrix operations in the I-Vector implementation. The system was benchmarked on a for a Google Nexus 6, showing a speed increase of 61.68% in scoring and 82.63% in enrollment operations with the implemented optimizations. The application was tested in mobile settings on a Nexus 7 tablet with forty participants, showing a rough accuracy of 84%. The optimized platform showed the capacity to perform near real-time recognition within a mobile setting and showcases the viability of I-Vector systems on resource-limited environments.
READ LESS

Summary

I-Vector based speaker and language identification provides state of the art performance. However, this comes as a more computationally complex solution, which can often lead to challenges in resource-limited devices, such as phones or tablets. We present the implementation of an I-Vector speaker and language recognition system on the Android...

READ MORE

Corpora for the evaluation of robust speaker recognition systems

Published in:
INTERSPEECH 2016: 16th Annual Conf. of the Int. Speech Communication Assoc., 8-12 September 2016.

Summary

The goal of this paper is to describe significant corpora available to support speaker recognition research and evaluation, along with details about the corpora collection and design. We describe the attributes of high-quality speaker recognition corpora. Considerations of the application, domain, and performance metrics are also discussed. Additionally, a literature survey of corpora used in speaker recognition research over the last 10 years is presented. Finally we show the most common corpora used in the research community and review them on their success in enabling meaningful speaker recognition research.
READ LESS

Summary

The goal of this paper is to describe significant corpora available to support speaker recognition research and evaluation, along with details about the corpora collection and design. We describe the attributes of high-quality speaker recognition corpora. Considerations of the application, domain, and performance metrics are also discussed. Additionally, a literature...

READ MORE

The MITLL NIST LRE 2015 Language Recognition System

Summary

In this paper we describe the most recent MIT Lincoln Laboratory language recognition system developed for the NIST 2015 Language Recognition Evaluation (LRE). The submission features a fusion of five core classifiers, with most systems developed in the context of an i-vector framework. The 2015 evaluation presented new paradigms. First, the evaluation included fixed training and open training tracks for the first time; second, language classification performance was measured across 6 language clusters using 20 language classes instead of an N-way language task; and third, performance was measured across a nominal 3-30 second range. Results are presented for the overall performance across the six language clusters for both the fixed and open training tasks. On the 6-cluster metric the Lincoln system achieved overall costs of 0.173 and 0.168 for the fixed and open tasks respectively.
READ LESS

Summary

In this paper we describe the most recent MIT Lincoln Laboratory language recognition system developed for the NIST 2015 Language Recognition Evaluation (LRE). The submission features a fusion of five core classifiers, with most systems developed in the context of an i-vector framework. The 2015 evaluation presented new paradigms. First...

READ MORE

The MITLL NIST LRE 2011 language recognition system

Summary

This paper presents a description of the MIT Lincoln Laboratory (MITLL) language recognition system developed for the NIST 2011 Language Recognition Evaluation (LRE). The submitted system consisted of a fusion of four core classifiers, three based on spectral similarity and one based on tokenization. Additional system improvements were achieved following the submission deadline. In a major departure from previous evaluations, the 2011 LRE task focused on closed-set pairwise performance so as to emphasize a system's ability to distinguish confusable language pairs. Results are presented for the 24-language confusable pair task at test utterance durations of 30, 10, and 3 seconds. Results are also shown using the standard detection metrics (DET, minDCF) and it is demonstrated the previous metrics adequately cover difficult pair performance. On the 30 s 24-language confusable pair task, the submitted and post-evaluation systems achieved average costs of 0.079 and 0.070 and standard detection costs of 0.038 and 0.033.
READ LESS

Summary

This paper presents a description of the MIT Lincoln Laboratory (MITLL) language recognition system developed for the NIST 2011 Language Recognition Evaluation (LRE). The submitted system consisted of a fusion of four core classifiers, three based on spectral similarity and one based on tokenization. Additional system improvements were achieved following...

READ MORE

Automatic detection of depression in speech using Gaussian mixture modeling with factor analysis

Summary

Of increasing importance in the civilian and military population is the recognition of Major Depressive Disorder at its earliest stages and intervention before the onset of severe symptoms. Toward the goal of more effective monitoring of depression severity, we investigate automatic classifiers of depression state, that have the important property of mitigating nuisances due to data variability, such as speaker and channel effects, unrelated to levels of depression. To assess our measures, we use a 35-speaker free-response speech database of subjects treated for depression over a six-week duration, along with standard clinical HAMD depression ratings. Preliminary experiments indicate that by mitigating nuisances, thus focusing on depression severity as a class, we can significantly improve classification accuracy over baseline Gaussian-mixture-model-based classifiers.
READ LESS

Summary

Of increasing importance in the civilian and military population is the recognition of Major Depressive Disorder at its earliest stages and intervention before the onset of severe symptoms. Toward the goal of more effective monitoring of depression severity, we investigate automatic classifiers of depression state, that have the important property...

READ MORE

Language recognition via i-vectors and dimensionality reduction

Published in:
2011 INTERSPEECH, 27-31 August 2011, pp. 857-860.

Summary

In this paper, a new language identification system is presented based on the total variability approach previously developed in the field of speaker identification. Various techniques are employed to extract the most salient features in the lower dimensional i-vector space and the system developed results in excellent performance on the 2009 LRE evaluation set without the need for any post-processing or backend techniques. Additional performance gains are observed when the system is combined with other acoustic systems.
READ LESS

Summary

In this paper, a new language identification system is presented based on the total variability approach previously developed in the field of speaker identification. Various techniques are employed to extract the most salient features in the lower dimensional i-vector space and the system developed results in excellent performance on the...

READ MORE

Informative dialect recognition using context-dependent pronunciation modeling

Published in:
Proc. IEEE Int. Conf. on Acoustics, Speech, and Signal Processing, ICASSP, 22-27 May 2011, pp. 4396-4399.

Summary

We propose an informative dialect recognition system that learns phonetic transformation rules, and uses them to identify dialects. A hidden Markov model is used to align reference phones with dialect specific pronunciations to characterize when and how often substitutions, insertions, and deletions occur. Decision tree clustering is used to find context-dependent phonetic rules. We ran recognition tasks on 4 Arabic dialects. Not only do the proposed systems perform well on their own, but when fused with baselines they improve performance by 21-36% relative. In addition, our proposed decision-tree system beats the baseline monophone system in recovering phonetic rules by 21% relative. Pronunciation rules learned by our proposed system quantify the occurrence frequency of known rules, and suggest rule candidates for further linguistic studies.
READ LESS

Summary

We propose an informative dialect recognition system that learns phonetic transformation rules, and uses them to identify dialects. A hidden Markov model is used to align reference phones with dialect specific pronunciations to characterize when and how often substitutions, insertions, and deletions occur. Decision tree clustering is used to find...

READ MORE

The MIT LL 2010 speaker recognition evaluation system: scalable language-independent speaker recognition

Published in:
Proc. IEEE Int. Conf. on Acoustics, Speech, and Signal Processing, ICASSP, 22-27 May 2011, pp. 5272-5275.

Summary

Research in the speaker recognition community has continued to address methods of mitigating variational nuisances. Telephone and auxiliary-microphone recorded speech emphasize the need for a robust way of dealing with unwanted variation. The design of recent 2010 NIST-SRE Speaker Recognition Evaluation (SRE) reflects this research emphasis. In this paper, we present the MIT submission applied to the tasks of the 2010 NIST-SRE with two main goals--language-independent scalable modeling and robust nuisance mitigation. For modeling, exclusive use of inner product-based and cepstral systems produced a language-independent computationally-scalable system. For robustness, systems that captured spectral and prosodic information, modeled nuisance subspaces using multiple novel methods, and fused scores of multiple systems were implemented. The performance of the system is presented on a subset of the NIST SRE 2010 core tasks.
READ LESS

Summary

Research in the speaker recognition community has continued to address methods of mitigating variational nuisances. Telephone and auxiliary-microphone recorded speech emphasize the need for a robust way of dealing with unwanted variation. The design of recent 2010 NIST-SRE Speaker Recognition Evaluation (SRE) reflects this research emphasis. In this paper, we...

READ MORE

The MITLL NIST LRE 2009 language recognition system

Published in:
Proc. IEEE Int. Conf. on Acoustics, Speech, and Signal Processing, ICASSP, 15 March 2010, pp. 4994-4997.

Summary

This paper presents a description of the MIT Lincoln Laboratory language recognition system submitted to the NIST 2009 Language Recognition Evaluation (LRE). This system consists of a fusion of three core recognizers, two based on spectral similarity and one based on tokenization. The 2009 LRE differed from previous ones in that test data included narrowband segments from worldwide Voice of America broadcasts as well as conventional recorded conversational telephone speech. Results are presented for the 23-language closed-set and open-set detection tasks at the 30, 10, and 3 second durations along with a discussion of the language-pair task. On the 30 second 23-language closed set detection task, the system achieved a 1.64 average error rate.
READ LESS

Summary

This paper presents a description of the MIT Lincoln Laboratory language recognition system submitted to the NIST 2009 Language Recognition Evaluation (LRE). This system consists of a fusion of three core recognizers, two based on spectral similarity and one based on tokenization. The 2009 LRE differed from previous ones in...

READ MORE