Publications

Refine Results

(Filters Applied) Clear All

R&D Areas

R&D Groups

Year

Items per page

By

Frederick S. Richardson Clear filter

The MITLL NIST LRE 2011 language recognition system

June 25, 2012

Conference Paper

Author:

Elliot Singer

…

Published in:

Odyssey 2012, the Speaker and Language Recognition Workshop, 25-28 June 2012.

Topic:

language recognition

R&D area:

Cyber Security and Information Sciences

R&D group:

Artificial Intelligence Technology and Systems

Summary

This paper presents a description of the MIT Lincoln Laboratory (MITLL) language recognition system developed for the NIST 2011 Language Recognition Evaluation (LRE). The submitted system consisted of a fusion of four core classifiers, three based on spectral similarity and one based on tokenization. Additional system improvements were achieved following the submission deadline. In a major departure from previous evaluations, the 2011 LRE task focused on closed-set pairwise performance so as to emphasize a system's ability to distinguish confusable language pairs. Results are presented for the 24-language confusable pair task at test utterance durations of 30, 10, and 3 seconds. Results are also shown using the standard detection metrics (DET, minDCF) and it is demonstrated the previous metrics adequately cover difficult pair performance. On the 30 s 24-language confusable pair task, the submitted and post-evaluation systems achieved average costs of 0.079 and 0.070 and standard detection costs of 0.038 and 0.033.

READ LESS

Summary

The MITLL NIST LRE 2011 language recognition system

NAP for high level language identification

May 22, 2011

Conference Paper

Author:

Frederick S. Richardson

…

William M. Campbell

Published in:

ICASSP 2011, IEEE Int. Conf. on Acoustics, Speech, and Signal Processing, 22-27 May 2011, pp. 4392-4395.

Topic:

language recognition

R&D area:

Cyber Security and Information Sciences

R&D group:

Artificial Intelligence Technology and Systems

Summary

Varying channel conditions present a difficult problem for many speech technologies such as language identification (LID). Channel compensation techniques have been shown to significantly improve performance in LID for acoustic systems. For high-level token systems, nuisance attribute projection (NAP) has been shown to perform well in the context of speaker identification. In this work, we describe a novel approach to dealing with the high dimensional sparse NAP training problem as applied to a 4-gram phonotactic LID system run on the NIST 2009 Language Recognition Evaluation (LRE) task. We demonstrate performance gains on the Voice of America (VOA) portion of the 2009 LRE data.

READ LESS

Summary

NAP for high level language identification

The MIT LL 2010 speaker recognition evaluation system: scalable language-independent speaker recognition

May 22, 2011

Conference Paper

Author:

Douglas E. Sturim

…

Published in:

Proc. IEEE Int. Conf. on Acoustics, Speech, and Signal Processing, ICASSP, 22-27 May 2011, pp. 5272-5275.

Topic:

speaker recognition

R&D area:

Cyber Security and Information Sciences

R&D group:

Artificial Intelligence Technology and Systems

Summary

Research in the speaker recognition community has continued to address methods of mitigating variational nuisances. Telephone and auxiliary-microphone recorded speech emphasize the need for a robust way of dealing with unwanted variation. The design of recent 2010 NIST-SRE Speaker Recognition Evaluation (SRE) reflects this research emphasis. In this paper, we present the MIT submission applied to the tasks of the 2010 NIST-SRE with two main goals--language-independent scalable modeling and robust nuisance mitigation. For modeling, exclusive use of inner product-based and cepstral systems produced a language-independent computationally-scalable system. For robustness, systems that captured spectral and prosodic information, modeled nuisance subspaces using multiple novel methods, and fused scores of multiple systems were implemented. The performance of the system is presented on a subset of the NIST SRE 2010 core tasks.

READ LESS

Summary

The MIT LL 2010 speaker recognition evaluation system: scalable language-independent speaker recognition

USSS-MITLL 2010 human assisted speaker recognition

January 2, 2011

Conference Paper

Author:

Reva Schwartz

…

Published in:

Proc. IEEE ICASSP, 26 May 2011, pp. 5904-7.

Topic:

biometrics

R&D area:

Cyber Security and Information Sciences

R&D group:

Summary

The United States Secret Service (USSS) teamed with MIT Lincoln Laboratory (MIT/LL) in the US National Institute of Standards and Technology's 2010 Speaker Recognition Evaluation of Human Assisted Speaker Recognition (HASR). We describe our qualitative and automatic speaker comparison processes and our fusion of these processes, which are adapted from USSS casework. The USSS-MIT/LL 2010 HASR results are presented. We also present post-evaluation results. The results are encouraging within the resolving power of the evaluation, which was limited to enable reasonable levels of human effort. Future ideas and efforts are discussed, including new features and capitalizing on naive listeners.

READ LESS

Summary

USSS-MITLL 2010 human assisted speaker recognition

Transcript-dependent speaker recognition using mixer 1 and 2

September 26, 2010

Conference Paper

Author:

Frederick S. Richardson

…

Joseph P. Campbell Jr

Published in:

INTERSPEECH 2010, 11th Annual Conference of the International Speech Communication Association, 26-30 September 2010, pp. 2102-2015.

Topic:

speaker recognition

R&D area:

Cyber Security and Information Sciences

R&D group:

Artificial Intelligence Technology and Systems

Summary

Transcript-dependent speaker-recognition experiments are performed with the Mixer 1 and 2 read-transcription corpus using the Lincoln Laboratory speaker recognition system. Our analysis shows how widely speaker-recognition performance can vary on transcript-dependent data compared to conversational data of the same durations, given enrollment data from the same spontaneous conversational speech. A description of the techniques used to deal with the unaudited data in order to create 171 male and 198 female text-dependent experiments from the Mixer 1 and 2 read transcription corpus is given.

READ LESS

Summary

Transcript-dependent speaker recognition using mixer 1 and 2

The MITLL NIST LRE 2009 language recognition system

March 15, 2010

Conference Paper

Author:

Pedro A. Torres-Carrasquillo

…

Published in:

Proc. IEEE Int. Conf. on Acoustics, Speech, and Signal Processing, ICASSP, 15 March 2010, pp. 4994-4997.

Topic:

speech processing

R&D area:

Cyber Security and Information Sciences

R&D group:

Artificial Intelligence Technology and Systems

Summary

This paper presents a description of the MIT Lincoln Laboratory language recognition system submitted to the NIST 2009 Language Recognition Evaluation (LRE). This system consists of a fusion of three core recognizers, two based on spectral similarity and one based on tokenization. The 2009 LRE differed from previous ones in that test data included narrowband segments from worldwide Voice of America broadcasts as well as conventional recorded conversational telephone speech. Results are presented for the 23-language closed-set and open-set detection tasks at the 30, 10, and 3 second durations along with a discussion of the language-pair task. On the 30 second 23-language closed set detection task, the system achieved a 1.64 average error rate.

READ LESS

Summary

The MITLL NIST LRE 2009 language recognition system

The MIT Lincoln Laboratory 2008 speaker recognition system

September 6, 2009

Conference Paper

Author:

Douglas E. Sturim

…

Published in:

INTERSPEECH 2009, 6-10 September 2009.

Topic:

speaker recognition

R&D area:

Cyber Security and Information Sciences

R&D group:

Artificial Intelligence Technology and Systems

Summary

In recent years methods for modeling and mitigating variational nuisances have been introduced and refined. A primary emphasis in this years NIST 2008 Speaker Recognition Evaluation (SRE) was to greatly expand the use of auxiliary microphones. This offered the additional channel variations which has been a historical challenge to speaker verification systems. In this paper we present the MIT Lincoln Laboratory Speaker Recognition system applied to the task in the NIST 2008 SRE. Our approach during the evaluation was two-fold: 1) Utilize recent advances in variational nuisance modeling (latent factor analysis and nuisance attribute projection) to allow our spectral speaker verification systems to better compensate for the channel variation introduced, and 2) fuse systems targeting the different linguistic tiers of information, high and low. The performance of the system is presented when applied on a NIST 2008 SRE task. Post evaluation analysis is conducted on the sub-task when interview microphones are present.

READ LESS

Summary

The MIT Lincoln Laboratory 2008 speaker recognition system

Discriminative N-gram selection for dialect recognition

September 6, 2009

Conference Paper

Author:

Frederick S. Richardson

…

Published in:

INTERSPEECH 2009, 6-10 September 2009.

Topic:

language recognition

R&D area:

Cyber Security and Information Sciences

R&D group:

Artificial Intelligence Technology and Systems

Summary

Dialect recognition is a challenging and multifaceted problem. Distinguishing between dialects can rely upon many tiers of interpretation of speech data - e.g., prosodic, phonetic, spectral, and word. High-accuracy automatic methods for dialect recognition typically rely upon either phonetic or spectral characteristics of the input. A challenge with spectral system, such as those based on shifted-delta cepstral coefficients, is that they achieve good performance but do not provide insight into distinctive dialect features. In this work, a novel method based upon discriminative training and phone N- grams is proposed. This approach achieves excellent classification performance, fuses well with other systems, and has interpretable dialect characteristics in the phonetic tier. The method is demonstrated on data from the LDC and prior NIST language recognition evaluations. The method is also combined with spectral methods to demonstrate state-of-the-art performance in dialect recognition.

READ LESS

Summary

Discriminative N-gram selection for dialect recognition

The MITLL NIST LRE 2007 language recognition system

September 22, 2008

Conference Paper

Author:

Pedro A. Torres-Carrasquillo

…

Published in:

INTERSPEECH 2008, 22-26 September 2008, 719-722.

Topic:

language recognition

R&D area:

Cyber Security and Information Sciences

R&D group:

Artificial Intelligence Technology and Systems

Summary

This paper presents a description of the MIT Lincoln Laboratory language recognition system submitted to the NIST 2007 Language Recognition Evaluation. This system consists of a fusion of four core recognizers, two based on tokenization and two based on spectral similarity. Results for NIST?s 14-language detection task are presented for both the closed-set and open-set tasks and for the 30, 10 and 3 second durations. On the 30 second 14-language closed set detection task, the system achieves a 1% equal error rate.

READ LESS

Summary

The MITLL NIST LRE 2007 language recognition system

Beyond frame independence: parametric modelling of time duration in speaker and language recognition

September 22, 2008

Conference Paper

Author:

Alan V. McCree

…

Published in:

INTERSPEECH 2008, 22-26 September 2008, pp. 767-770.

Topic:

speaker recognition

R&D area:

Cyber Security and Information Sciences

R&D group:

Tactical Edge Communications

Summary

In this work, we address the question of generating accurate likelihood estimates from multi-frame observations in speaker and language recognition. Using a simple theoretical model, we extend the basic assumption of independent frames to include two refinements: a local correlation model across neighboring frames, and a global uncertainty due to train/test channel mismatch. We present an algorithm for discriminative training of the resulting duration model based on logistic regression combined with a bisection search. We show that using this model we can achieve state-of-the-art performance for the NIST LRE07 task. Finally, we show that these more accurate class likelihood estimates can be combined to solve multiple problems using Bayes' rule, so that we can expand our single parametric backend to replace all six separate back-ends used in our NIST LRE submission for both closed and open sets.

READ LESS

Summary

Beyond frame independence: parametric modelling of time duration in speaker and language recognition

Publications

Refine Results

By

The MITLL NIST LRE 2011 language recognition system

Summary

Summary

Summary

Summary

Summary

Summary

Summary

Summary

Summary

Summary

Summary

Summary

Summary

Summary

Summary

Summary

Summary

Summary

Summary

Summary

Showing Results