Publications

Refine Results

(Filters Applied) Clear All

R&D Areas

R&D Groups

Year

Items per page

The MIT Lincoln Laboratory RT-04F diarization systems: applications to broadcast audio and telephone conversations

November 8, 2004

Conference Paper

Author:

Douglas A. Reynolds

…

Pedro A. Torres-Carrasquillo

Published in:

NIST Rich Transcription Workshop, 8-11 November 2004.

Topic:

speaker recognition

R&D area:

Cyber Security and Information Sciences

R&D group:

Artificial Intelligence Technology and Systems

Summary

Audio diarization is the process of annotating an input audio channel with information that attributes (possibly overlapping) temporal regions of signal energy to their specific sources. These sources can include particular speakers, music, background noise sources, and other signal source/channel characteristics. Diarization has utility in making automatic transcripts more readable and in searching and indexing audio archives. In this paper we describe the systems developed by MITLL and used in DARPA EARS Rich Transcription Fall 2004 (RT-04F) speaker diarization evaluation. The primary system is based on a new proxy speaker model approach and the secondary system follows a more standard BIC based clustering approach. We present experiments analyzing performance of the systems and present a cross-cluster recombination approach that significantly improves performance. In addition, we also present results applying our system to a telephone speech, summed channel speaker detection task.

READ LESS

Summary

The MIT Lincoln Laboratory RT-04F diarization systems: applications to broadcast audio and telephone conversations

Robust collaborative multicast service for airborne command and control environment

October 31, 2004

Conference Paper

Author:

Roger I. Khazan

…

Published in:

IEEE MILCOM 2004, Vol. 3, 31 October - 3 November 2004, pp. 1666-1674.

Topic:

communications

R&D area:

Cyber Security and Information Sciences

R&D group:

Summary

RCM (Robust Collaborative Multicast) is a communication service designed to support collaborative applications operating in dynamic, mission-critical environments. RCM implements a set of well-specified message ordering and reliability properties that balance two conflicting goals: a)providing low-latency, highly-available, reliable communication service, and b) guaranteeing global consistency in how different participants perceive their communication. Both of these goals are important for collaborative applications. In this paper, we describe RCM, its modular and flexible design, and a collection of simple, light-weight protocols that implement it. We also report on several experiments with an RCM prototype in a test-bed environment.

READ LESS

Summary

Robust collaborative multicast service for airborne command and control environment

A comparison of soft and hard spectral subtraction for speaker verification

October 4, 2004

Conference Paper

Author:

Michael T. Padilla

…

Thomas F. Quatieri

Published in:

8th Int. Conf. on Spoken Language Processing, ICSLP 2004, 4-8 October 2004.

Topic:

speech enhancement

R&D area:

Cyber Security and Information Sciences

R&D group:

Artificial Intelligence Technology and Systems

Summary

An important concern in speaker recognition is the performance degradation that occurs when speaker models trained with speech from one type of channel are subsequently used to score speech from another type of channel, known as channel mismatch. This paper investigates the relative performance of two different spectral subtraction methods for additive noise compensation in the context of speaker verification. The first method, termed "soft" spectral subtraction, is performed in the spectral domain on the |DFT|^2 values of the speech frames while the second method, termed "hard" spectral subtraction, is performed on the Mel-filter energy features. It is shown through both an analytical argument as well as a simulation that soft spectral subtraction results in a higher signal-to-noise ratio in the resulting Mel-filter energy features. In the context of Gaussian mixture model-based speaker verification with additive noise in testing utterances, this is shown to result in an equal error rate improvement over a system without spectral subtraction of approximately 7% in absolute terms, 21% in relative terms, over an additive white Gaussian noise range of 5-25 dB.

READ LESS

Summary

A comparison of soft and hard spectral subtraction for speaker verification

Channel compensation for SVM speaker recognition

May 31, 2004

Conference Paper

Author:

Alex Solomonoff

…

Published in:

Odyssey, The Speaker and Language Recognition Workshop, 31 May - 3 June 2004.

Topic:

speaker recognition

R&D area:

Cyber Security and Information Sciences

R&D group:

Artificial Intelligence Technology and Systems

Summary

One of the major remaining challenges to improving accuracy in state-of-the-art speaker recognition algorithms is reducing the impact of channel and handset variations on system performance. For Gaussian Mixture Model based speaker recognition systems, a variety of channel-adaptation techniques are known and available for adapting models between different channel conditions, but for the much more recent Support Vector Machine (SVM) based approaches to this problem, much less is known about the best way to handle this issue. In this paper we explore techniques that are specific to the SVM framework in order to derive fully non-linear channel compensations. The result is a system that is less sensitive to specific kinds of labeled channel variations observed in training.

READ LESS

Summary

Channel compensation for SVM speaker recognition

The MMSR bilingual and crosschannel corpora for speaker recognition research and evaluation

May 31, 2004

Conference Paper

Author:

Joseph P. Campbell Jr

…

Published in:

ODYSSEY 2004, Speaker and Language Recognition Workshop, 31 May - 3 June 2004.

Topic:

human language technology

R&D area:

Cyber Security and Information Sciences

R&D group:

Artificial Intelligence Technology and Systems

Summary

We describe efforts to create corpora to support and evaluate systems that meet the challenge of speaker recognition in the face of both channel and language variation. In addition to addressing ongoing evaluation of speaker recognition systems, these corpora are aimed at the bilingual and crosschannel dimensions. We report on specific data collection efforts at the Linguistic Data Consortium, the 2004 speaker recognition evaluation program organized by the National Institute of Standards and Technology (NIST), and the research ongoing at the US Federal Bureau of Investigation and MIT Lincoln Laboratory. We cover the design and requirements, the collections and evaluation integrating discussions of the data preparation, research, technology development and evaluation on a grand scale.

READ LESS

Summary

The MMSR bilingual and crosschannel corpora for speaker recognition research and evaluation

Fusing discriminative and generative methods for speaker recognition: experiments on switchboard and NFI/TNO field data

May 31, 2004

Conference Paper

Author:

William M. Campbell

…

Published in:

ODYSSEY 2004, Speaker and Language Recognition Workshop, 31 May - 3 June 2004.

Topic:

speaker recognition

R&D area:

Cyber Security and Information Sciences

R&D group:

Artificial Intelligence Technology and Systems

Summary

Discriminatively trained support vector machines have recently been introduced as a novel approach to speaker recognition. Support vector machines (SVMs) have a distinctly different modeling strategy in the speaker recognition problem. The standard Gaussian mixture model (GMM) approach focuses on modeling the probability density of the speaker and the background (a generative approach). In contrast, the SVM models the boundary between the classes. Another interesting aspect of the SVM is that it does not directly produce probabilistic scores. This poses a challenge for combining results with a GMM. We therefore propose strategies for fusing the two approaches. We show that the SVM and GMM are complementary technologies. Recent evaluations by NIST (telephone data) and NFI/TNO (forensic data) give a unique opportunity to test the robustness and viability of fusing GMM and SVM methods. We show that fusion produces a system which can have relative error rates 23% lower than individual systems.

READ LESS

Summary

Fusing discriminative and generative methods for speaker recognition: experiments on switchboard and NFI/TNO field data

Dialect identification using Gaussian mixture models

May 31, 2004

Conference Paper

Author:

Pedro A. Torres-Carrasquillo

…

Published in:

ODYSSEY 2004, Speaker and Language Recognition Workshop, 31 May - 3 June 2004.

Topic:

language recognition

R&D area:

Cyber Security and Information Sciences

R&D group:

Artificial Intelligence Technology and Systems

Summary

Recent results in the area of language identification have shown a significant improvement over previous systems. In this paper, we evaluate the related problem of dialect identification using one of the techniques recently developed for language identification, the Gaussian mixture models with shifted-delta-cepstral features. The system shown is developed using the same methodology followed for the language identification case. Results show that the use of the GMM techniques yields an average of 30% equal error rate for the dialects in the Miami corpus and about 13% equal error rate for the dialects in the CallFriend corpus.

READ LESS

Summary

Dialect identification using Gaussian mixture models

Speaker diarisation for broadcast news

May 31, 2004

Conference Paper

Author:

Sue E. Tranter

…

Douglas A. Reynolds

Published in:

Odyssey 2004, 31 May - 4 June 2004.

Topic:

speaker recognition

R&D area:

Cyber Security and Information Sciences

R&D group:

Artificial Intelligence Technology and Systems

Summary

It is often important to be able to automatically label 'who spoke when' during some audio data. This paper describes two systems for audio segmentation developed at CUED and MIT-LL and evaluates their performance using the speaker diarisation score defined in the 2003 Rich Transcription Evaluation. A new clustering procedure and BIC-based stopping criterion for the CUED system is introduced which improves both performance and robustness to changes in segmentation. Finally a hybrid 'Plug and Play' system is built which combines different parts of the CUED and MIT-LL systems to produce a single system which outperforms both the individual systems.

READ LESS

Summary

Speaker diarisation for broadcast news

Language recognition with support vector machines

May 31, 2004

Conference Paper

Author:

William M. Campbell

…

Published in:

ODYSSEY 2004, Speaker and Language Recognition Workshop, 31 May - 3 June 2004.

Topic:

language recognition

R&D area:

Cyber Security and Information Sciences

R&D group:

Artificial Intelligence Technology and Systems

Summary

Support vector machines (SVMs) have become a popular tool for discriminative classification. Powerful theoretical and computational tools for support vector machines have enabled significant improvements in pattern classification in several areas. An exciting area of recent application of support vector machines is in speech processing. A key aspect of applying SVMs to speech is to provide a SVM kernel which compares sequences of feature vectors--a sequence kernel. We propose the use of sequence kernels for language recognition. We apply our methods to the NIST 2003 language evaluation task. Results demonstrate the potential of the new SVM methods.

READ LESS

Summary

Language recognition with support vector machines

The effect of text difficulty on machine translation performance -- a pilot study with ILR-related texts in Spanish, Farsi, Arabic, Russian and Korean

May 26, 2004

Conference Paper

Author:

Ray Clifford

…

Published in:

4th Int. Conf. on Language Resources and Evaluation, LREC, 26-28 May 2004.

Topic:

machine translation

R&D area:

Cyber Security and Information Sciences

R&D group:

Artificial Intelligence Technology and Systems

Summary

We report on initial experiments that examine the relationship between automated measures of machine translation performance (Doddington, 2003, and Papineni et al. 2001) and the Interagency Language Roundtable (ILR) scale of language proficiency/difficulty that has been in standard use for U.S. government language training and assessment for the past several decades (Child, Clifford and Lowe 1993). The main question we ask is how technology-oriented measures of MT performance relate to the ILR difficulty levels, where we understand that a linguist with ILR proficiency level N is expected to be able to understand a document rated at level N, but to have increasing difficulty with documents at higher levels. In this paper, we find that some key aspects of MT performance track with ILR difficulty levels, primarily for MT output whose quality is good enough to be readable by human readers.

READ LESS

Summary

The effect of text difficulty on machine translation performance -- a pilot study with ILR-related texts in Spanish, Farsi, Arabic, Russian and Korean

Publications

Refine Results

The MIT Lincoln Laboratory RT-04F diarization systems: applications to broadcast audio and telephone conversations

Summary

Summary

Robust collaborative multicast service for airborne command and control environment

Summary

Summary

A comparison of soft and hard spectral subtraction for speaker verification

Summary

Summary

Channel compensation for SVM speaker recognition

Summary

Summary

The MMSR bilingual and crosschannel corpora for speaker recognition research and evaluation

Summary

Summary

Fusing discriminative and generative methods for speaker recognition: experiments on switchboard and NFI/TNO field data

Summary

Summary

Dialect identification using Gaussian mixture models

Summary

Summary

Speaker diarisation for broadcast news

Summary

Summary

Language recognition with support vector machines

Summary

Summary

The effect of text difficulty on machine translation performance -- a pilot study with ILR-related texts in Spanish, Farsi, Arabic, Russian and Korean

Summary

Summary

Showing Results