Publications

…

Published in:

Proc. IEEE Int. Conf. on Acoustics, Speech, and Signal Processing, ICASSP, 15-20 April 2007, Vol. IV, pp. 989-992.

Topic:

R&D area:

Cyber Security and Information Sciences

R&D group:

Artificial Intelligence Technology and Systems

Summary

Language recognition is typically performed with methods that exploit phonotactics--a phone recognition language modeling (PRLM) system. A PRLM system converts speech to a lattice of phones and then scores a language model. A standard extension to this scheme is to use multiple parallel phone recognizers (PPRLM). In this paper, we modify this approach in two distinct ways. First, we replace the phone tokenizer by a powerful speech-to-text system. Second, we use a discriminative support vector machine for language modeling. Our goals are twofold. First, we explore the ability of a single speech-to-text system to distinguish multiple languages. Second, we fuse the new system with an SVM PRLM system to see if it complements current approaches. Experiments on the 2005 NIST language recognition corpus show the new word system accomplishes these goals and has significant potential for language recognition.

READ LESS

Summary

Language recognition with word lattices and support vector machines

Robust speaker recognition with cross-channel data: MIT-LL results on the 2006 NIST SRE auxiliary microphone task

April 1, 2007

Conference Paper

Author:

Douglas E. Sturim

…

Published in:

Proc. 32nd IEEE Int. Conf. on Acoustics, Speech, and Signal Processing, ICASSP, April 2007, pp. IV-49 - IV-52.

Topic:

R&D area:

Cyber Security and Information Sciences

R&D group:

Artificial Intelligence Technology and Systems

Summary

One particularly difficult challenge for cross-channel speaker verification is the auxiliary microphone task introduced in the 2005 and 2006 NIST Speaker Recognition Evaluations, where training uses telephone speech and verification uses speech from multiple auxiliary microphones. This paper presents two approaches to compensate for the effects of auxiliary microphones on the speech signal. The first compensation method mitigates session effects through Latent Factor Analysis (LFA) and Nuisance Attribute Projection (NAP). The second approach operates directly on the recorded signal with noise reduction techniques. Results are presented that show a reduction in the performance gap between telephone and auxiliary microphone data.

READ LESS

Summary

Robust speaker recognition with cross-channel data: MIT-LL results on the 2006 NIST SRE auxiliary microphone task

The MIT-LL/IBM 2006 speaker recognition system: high-performance reduced-complexity recognition

April 1, 2007

Conference Paper

Author:

…

Published in:

Proc. 32nd IEEE Int. Conf. on Acoustics, Speech, and Signal Processing, ICASSP, April 2007, pp. IV-217 - IV-220.

Topic:

R&D area:

Cyber Security and Information Sciences

R&D group:

Artificial Intelligence Technology and Systems

Summary

Many powerful methods for speaker recognition have been introduced in recent years--high-level features, novel classifiers, and channel compensation methods. A common arena for evaluating these methods has been the NIST speaker recognition evaluation (SRE). In the NIST SRE from 2002-2005, a popular approach was to fuse multiple systems based upon cepstral features and different linguistic tiers of high-level features. With enough enrollment data, this approach produced dramatic error rate reductions and showed conceptually that better performance was attainable. A drawback in this approach is that many high-level systems were being run independently requiring significant computational complexity and resources. In 2006, MIT Lincoln Laboratory focused on a new system architecture which emphasized reduced complexity. This system was a carefully selected mixture of high-level techniques, new classifier methods, and novel channel compensation techniques. This new system has excellent accuracy and has substantially reduced complexity. The performance and computational aspects of the system are detailed on a NIST 2006 SRE task.

READ LESS

Summary

The MIT-LL/IBM 2006 speaker recognition system: high-performance reduced-complexity recognition

Triage framework for resource conservation in a speaker identification system

April 1, 2007

Conference Paper

Author:

Arvind Jairam

…

Published in:

Proc. 32nd IEEE Int. Conf. on Acoustics, Speech, and Signal Processing, ICASSP, April 2007, pp. IV-69 - IV-72.

Topic:

R&D area:

Cyber Security and Information Sciences

R&D group:

Artificial Intelligence Technology and Systems

Summary

We present a novel framework for triaging (prioritizing and discarding) data to conserve resources for a speaker identification (SID) system. Our work is motivated by applications that require a SID system to process an overwhelming volume of audio data. We design a triage filter whose goal is to conserve recognizer resources while preserving relevant content. We propose triage methods that use signal quality assessment tools, a scaled-down version of the main recognizer itself, and a fusion of these measures. We define a new precision-based measure of effectiveness for our triage framework. Our experimental results with the 35-speaker tactical SID corpus bear out the validity of our approach.

READ LESS

Summary

Triage framework for resource conservation in a speaker identification system

Automatic language recognition via spectral and token based approaches

January 1, 2007

Book Chapter

Author:

Douglas A. Reynolds

…

Published in:

Chapter 41 in Springer Handbook of Speech Processing and Communication, 2007, pp. 811-24.

Topic:

R&D area:

Cyber Security and Information Sciences

R&D group:

Artificial Intelligence Technology and Systems

Summary

Automatic language recognition from speech consists of algorithms and techniques that model and classify the language being spoken. Current state-of-the-art language recognition systems fall into two broad categories: spectral- and token-sequence-based approaches. In this chapter, we describe algorithms for extracting features and models representing these types of language cues and systems for making recognition decisions using one or more of these language cues. A performance assessment of these systems is also provided, in terms of both accuracy and computation considerations, using the National Institute of Science and Technology (NIST) language recognition evaluation benchmarks.

READ LESS

Summary

Automatic language recognition via spectral and token based approaches

Missing feature theory with soft spectral subtraction for speaker verification

September 17, 2006

Conference Paper

Author:

Michael T. Padilla

…

Published in:

Interspeech 2006, ICSLP, 17-21 September 2006.

Topic:

speech enhancement

R&D area:

Cyber Security and Information Sciences

R&D group:

Artificial Intelligence Technology and Systems

Summary

This paper considers the problem of training/testing mismatch in the context of speaker verification and, in particular, explores the application of missing feature theory in the case of additive white Gaussian noise corruption in testing. Missing feature theory allows for corrupted features to be removed from scoring, the initial step of which is the detection of these features. One method of detection, employing spectral subtraction, is studied in a controlled manner and it is shown that with missing feature compensation the resulting verification performance is improved as long as a minimum number of features remain. Finally, a blending of "soft" spectral subtraction for noise mitigation and missing feature compensation is presented. The resulting performance improves on the constituent techniques alone, reducing the equal error rate by about 15% over an SNR range of 5 - 25 dB.

READ LESS

Summary

Missing feature theory with soft spectral subtraction for speaker verification

An overview of automatic speaker diarization systems

September 1, 2006

Journal Article

Author:

Sue E. Tranter

…

Douglas A. Reynolds

Published in:

IEEE Trans. Audio, Speech, and Language Processing, Vol. 14, No. 5, September 2006, pp. 1557-1565.

Topic:

R&D area:

Cyber Security and Information Sciences

R&D group:

Artificial Intelligence Technology and Systems

Summary

Audio diarization is the process of annotating an input audio channel with information that attributes (possibly overlapping) temporal regions of signal energy to their specific sources. These sources can include particular speakers, music, background noise sources, and other signal source/channel characteristics. Diarization can be used for helping speech recognition, facilitating the searching and indexing of audio archives, and increasing the richness of automatic transcriptions, making them more readable. In this paper, we provide an overview of the approaches currently used in a key area of audio diarization, namely speaker diarization, and discuss their relative merits and limitations. Performances using the different techniques are compared within the framework of the speaker diarization task in the DARPA EARS Rich Transcription evaluations. We also look at how the techniques are being introduced into real broadcast news systems and their portability to other domains and tasks such as meetings and speaker verification.

READ LESS

Summary

An overview of automatic speaker diarization systems

Advanced language recognition using cepstra and phonotactics: MITLL system performance on the NIST 2005 Language Recognition Evaluation

June 28, 2006

Conference Paper

Author:

…

Published in:

IEEE Odyssey 2006, 28-30 June 2006.

Topic:

R&D area:

Cyber Security and Information Sciences

R&D group:

Artificial Intelligence Technology and Systems

Summary

This paper presents a description of the MIT Lincoln Laboratory submissions to the 2005 NIST Language Recognition Evaluation (LRE05). As was true in 2003, the 2005 submissions were combinations of core cepstral and phonotactic recognizers whose outputs were fused to generate final scores. For the 2005 evaluation, Lincoln Laboratory had five submissions built upon fused combinations of six core systems. Major improvements included the generation of phone streams using lattices, SVM-based language models using lattice-derived phonotactics, and binary tree language models. In addition, a development corpus was assembled that was designed to test robustness to unseen languages and sources. Language recognition trends based on NIST evaluations conducted since 1996 show a steady improvement in language recognition performance.

READ LESS

Summary

Advanced language recognition using cepstra and phonotactics: MITLL system performance on the NIST 2005 Language Recognition Evaluation

Experiments with lattice-based PPRLM language identification

June 28, 2006

Conference Paper

Author:

Wade Shen

…

Published in:

IEEE Odyssey, 28-30 June 2006.

Topic:

R&D area:

Cyber Security and Information Sciences

R&D group:

Artificial Intelligence Technology and Systems

Summary

In this paper we describe experiments conducted during the development of a lattice-based PPRLM language identification system as part of the NIST 2005 language recognition evaluation campaign. In experiments following LRE05 the PPRLM-lattice sub-system presented here achieved a 30s/primary condition EER of 4.87%, making it the single best performing recognizer developed by the MIT-LL team. Details of implementation issues and experimental results are presented and interactions with backend score normalization are explored.

READ LESS

Summary

Experiments with lattice-based PPRLM language identification

Understanding scores in forensic speaker recognition

June 28, 2006

Conference Paper

Author:

…

Published in:

IEEE Odyssey 2006, 28-30 June 2006.

Topic: