Publications

Refine Results

(Filters Applied) Clear All

R&D Areas

R&D Groups

Year

Items per page

By

Douglas A. Reynolds Clear filter

Speaker recognition using G.729 speech codec parameters

June 5, 2000

Conference Paper

Author:

Thomas F. Quatieri

…

Published in:

Proc. IEEE Int. Conf. on Acoustics, Speech and Signal Processing, ICASSP, Vol. II, 5-9 June 2000, pp. 1089-1092.

Topic:

speaker recognition

R&D area:

Cyber Security and Information Sciences

R&D group:

Artificial Intelligence Technology and Systems

Summary

Experiments in Gaussian-mixture-model speaker recognition from mel-filter bank energies (MFBs) of the G.729 codec all-pole spectral envelope, showed significant performance loss relative to the standard mel-cepstral coefficients of G.729 synthesized (coded) speech. In this paper, we investigate two approaches to recover speaker recognition performance from G.729 parameters, rather than deriving cepstra from MFBs of an all-pole spectrum. Specifically, the G.729 LSFs are converted to "direct" cepstral coefficients for which there exists a one-to-one correspondence with the LSFs. The G.729 residual is also considered; in particular, appending G.729 pitch as a single parameter to the direct cepstral coefficients gives further performance gain. The second nonparametric approach uses the original MFB paradigm, but adds harmonic striations to the G.729 all-pole spectral envelope. Although obtaining considerable performance gains with these methods, we have yet to match the performance of G.729 synthesized speech, motivating the need for representing additional fine structure of the G.729 residual.

READ LESS

Summary

Speaker recognition using G.729 speech codec parameters

The NIST Speaker Recognition Evaluation - overview, methodology, systems, results, perspective

June 1, 2000

Journal Article

Author:

George R. Doddington

…

Published in:

Speech Commun., Vol. 31, Nos. 2-3, June 2000, pp. 225-254.

Topic:

speaker recognition

R&D area:

Cyber Security and Information Sciences

R&D group:

Artificial Intelligence Technology and Systems

Summary

This paper, based on three presentations made in 1998 at the RLA2C Workshop in Avignon, discusses the evaluation of speaker recognition systems from several perspectives. A general discussion of the speaker recognition task and the challenges and issues involved in its evaluation is offered. The NIST evaluations in this area and specifically the 1998 evaluation, its objectives, protocols and test data, are described. The algorithms used by the systems that were developed for this evaluation are summarized, compared and contrasted. Overall performance results of this evaluation are presented by means of detection error trade-off (DET) curves. These show the performance trade-off of missed detections and false alarms for each system and the effects on performance of training condition, test segment duration, the speakers' sex and the match or mismatch of training and test handsets. Several factors that were found to have an impact on performance, including pitch frequency, handset type and noise, are discussed and DET curves showing their effects are presented. The paper concludes with some perspective on the history of this technology and where it may be going.

READ LESS

Summary

The NIST Speaker Recognition Evaluation - overview, methodology, systems, results, perspective

Approaches to speaker detection and tracking in conversational speech

January 1, 2000

Journal Article

Author:

Robert B. Dunn

…

Published in:

Digit. Signal Process., Vol. 10, No. 1, January/April/July, 2000, pp. 93-112. (Fifth Annual NIST Speaker Recognition Workshop, 3-4 June 1999.)

Topic:

speaker recognition

R&D area:

Cyber Security and Information Sciences

R&D group:

Artificial Intelligence Technology and Systems

Summary

Two approaches to detecting and tracking speakers in multispeaker audio are described. Both approaches use an adapted Gaussian mixture model, universal background model (GMM-UBM) speaker detection system as the core speaker recognition engine. In one approach, the individual log-likelihood ratio scores, which are produced on a frame-by-frame basis by the GMM-UBM system, are used to first partition the speech file into speaker homogenous regions and then to create scores for these regions. We refer to this approach as internal segmentation. Another approach uses an external segmentation algorithm, based on blind clustering, to partition the speech file into speaker homogenous regions. The adapted GMM-UBM system then scores each of these regions as in the single-speaker recognition case. We show that the external segmentation system outperforms the internal segmentation system for both detection and tracking. In addition, we show how different components of the detection and tracking algorithms contribute to the overall system performance.

READ LESS

Summary

Approaches to speaker detection and tracking in conversational speech

Speaker verification using adapted Gaussian mixture models

January 1, 2000

Journal Article

Author:

Douglas A. Reynolds

…

Published in:

Digit. Signal Process., Vol. 10, No. 1-3, January/April/July, 2000, pp. 19-41. (Fifth Annual NIST Speaker Recognition Workshop, 3-4 June 1999.)

Topic:

speaker recognition

R&D area:

Cyber Security and Information Sciences

R&D group:

Artificial Intelligence Technology and Systems

Summary

In this paper we describe the major elements of MIT Lincoln Laboratory's Gaussian mixture model (GMM)-based speaker verification system used successfully in several NIST Speaker Recognition Evaluations (SREs). The system is built around the likelihood ratio test for verification, using simple but effective GMMs for likelihood functions, a universal background model (UBM) for alternative speaker representation, and a form of Bayesian adaptation to derive speaker models from the UBM. The development and use of a handset detector and score normalization to greatly improve verification performance is also described and discussed. Finally, representative performance benchmarks and system behavior experiments on NIST SRE corpora are presented.

READ LESS

Summary

Speaker verification using adapted Gaussian mixture models

A study of computation speed-ups of the GMM-UBM speaker recognition system

September 5, 1999

Conference Paper

Author:

John J. McLaughlin

…

Published in:

6th European Conf. on Speech Communication and Technology, EUROSPEECH, 5-9 September 1999.

Topic:

speaker recognition

R&D area:

Cyber Security and Information Sciences

R&D group:

Artificial Intelligence Technology and Systems

Summary

The Gaussian Mixture Model Universal Background Model (GMM-UBM) speaker recognition system has demonstrated very high performance in several NIST evaluations. Such evaluations, however, are concerned only with classification accuracy. In many applications, system effectiveness must be evaluated in light of both accuracy and execution speed. We present here a number of techniques for decreasing computation. Using data from the Switchboard telephone speech corpus, we show that significant speed-ups can be obtained while sacrificing surprisingly little accuracy. We expect that these techniques, involving lowering model order as well as processing fewer speech frames, will apply equally well to other recognition systems.

READ LESS

Summary

A study of computation speed-ups of the GMM-UBM speaker recognition system

Evaluation of confidence measures for language identification

September 5, 1999

Conference Paper

Author:

Kay M. Berkling

…

Published in:

6th European Conf. on Speech Communication and Technology, EUROSPEECH, 5-9 September 1999.

Topic:

language recognition

R&D area:

Cyber Security and Information Sciences

R&D group:

Artificial Intelligence Technology and Systems

Summary

In this paper we examine various ways to derive confidence measures for a language identification system, using phone recognition followed by language models, and describe the application of an evaluation metric for measuring the "goodness" of the different confidence measures. Experiments are conducted on the 1996 NIST Language Identification Evaluation corpus (derived from the Callfriend corpus of conversational telephone speech). The system is trained on the NIST 96 development data and evaluated on the NIST 96 evaluation data. Results indicate that we are able to predict the performance of a system and quantitatively evaluate how well the prediction holds on new data.

READ LESS

Summary

Evaluation of confidence measures for language identification

Speaker and language recognition using speech codec parameters

September 5, 1999

Conference Paper

Author:

Thomas F. Quatieri

…

Published in:

EUROSPEECH 99, 5-10 September 1999.

Topic:

speaker recognition

R&D area:

Cyber Security and Information Sciences

R&D group:

Artificial Intelligence Technology and Systems

Summary

In this paper, we investigate the effect of speech coding on speaker and language recognition tasks. Three coders were selected to cover a wide range of quality and bit rates: GSM at 12.2 kb/s, G.729 at 8 kb/s, and G.723.1 at 5.3 kb/s. Our objective is to measure recognition performance from either the synthesized speech or directly from the coder parameters themselves. We show that using speech synthesized from the three codecs, GMM-based speaker verification and phone-based language recognition performance generally degrades with coder bit rate, i.e., from GSM to G.729 to G.723.1, relative to an uncoded baseline. In addition, speaker verification for all codecs shows a performance decrease as the degree of mismatch between training and testing conditions increases, while language recognition exhibited no decrease in performance. We also present initial results in determining the relative importance of codec system components in their direct use for recognition tasks. For the G.729 codec, it is shown that removal of the post- filter in the decoder helps speaker verification performance under the mismatched condition. On the other hand, with use of G.729 LSF-based mel-cepstra, performance decreases under all conditions, indicating the need for a residual contribution to the feature representation.

READ LESS

Summary

Speaker and language recognition using speech codec parameters

Modeling of the glottal flow derivative waveform with application to speaker identification

September 1, 1999

Journal Article

Author:

M. D. Plumpe

…

Published in:

IEEE Trans. Speech Audio Process., Vol. 7, No. 5, September 1999, pp. 569-586.

Topic:

speaker recognition

R&D area:

Cyber Security and Information Sciences

R&D group:

Artificial Intelligence Technology and Systems

Summary

An automatic technique for estimating and modeling the glottal flow derivative source waveform from speech, and applying the model parameters to speaker identification, is presented. The estimate of the glottal flow derivative is decomposed into coarse structure, representing the general flow shape, and fine structure, comprising aspiration and other perturbations in the flow, from which model parameters are obtained. The glottal flow derivative is estimated using an inverse filter determined within a time interval of vocal-fold closure that is identified through differences in formant frequency modulation during the open and closed phases of the glottal cycle. This formant motion is predicted by Ananthapadmanabha and Fant to be a result of time-varying and nonlinear source/vocal tract coupling within a glottal cycle. The glottal flow derivative estimate is modeled using the Liljencrants-Fant model to capture its coarse structure, while the fine structure of the flow derivative is represented through energy and perturbation measures. The model parameters are used in a Gaussian mixture model speaker identification (SID) system. Both coarse- and fine-structure glottal features are shown to contain significant speaker-dependent information. For a large TIMIT database subset, averaging over male and female SID scores, the coarse-structure parameters achieve about 60% accuracy, the fine-structure parameters give about 40% accuracy, and their combination yields about 70% correct identification. Finally, in preliminary experiments on the counterpart telephone-degraded NTIMIT database, about a 5% error reduction in SID scores is obtained when source features are combined with traditional mel-cepstral measures.

READ LESS

Summary

Modeling of the glottal flow derivative waveform with application to speaker identification

Automatic speaker clustering from multi-speaker utterances

March 15, 1999

Conference Paper

Author:

John J. McLaughlin

…

Published in:

Proc. IEEE Int. Conf. on Acoustics, Speech and Signal Processing, ICASSP, Vol. II, 15-19 March 1999, pp. 817-820.

Topic:

speaker recognition

R&D area:

Cyber Security and Information Sciences

R&D group:

Artificial Intelligence Technology and Systems

Summary

Blind clustering of multi-person utterances by speaker is complicated by the fact that each utterance has at least two talkers. In the case of a two-person conversation, one can simply split each conversation into its respective speaker halves, but this introduces error which ultimately hurts clustering. We propose a clustering algorithm which is capable of associating each conversation with two clusters (and therefore two-speakers) obviating the need for splitting. Results are given for two speaker conversations culled from the Switchboard corpus, and comparisons are made to results obtained on single-speaker utterances. We conclude that although the approach is promising, our technique for computing inter-conversation similarities prior to clustering needs improvement.

READ LESS

Summary

Automatic speaker clustering from multi-speaker utterances

Corpora for the evaluation of speaker recognition systems

March 15, 1999

Conference Paper

Author:

Joseph P. Campbell Jr

…

Douglas A. Reynolds

Published in:

ICASSP 1999, Proc. IEEE Int. Conf. on Acoustics, Speech and Signal Processing, 15-19 March 1999.

Topic:

human language technology

R&D area:

Cyber Security and Information Sciences

R&D group:

Artificial Intelligence Technology and Systems

Summary

Using standard speech corpora for development and evaluation has proven to be very valuable in promoting progress in speech and speaker recognition research. In this paper, we present an overview of current publicly available corpora intended for speaker recognition research and evaluation. We outline the corpora's salient features with respect to their suitability for conducting speaker recognition experiments and evaluations. Links to these corpora, and to new corpora, will appear on the web http://www.apl.jhu.edu/Classes/Notes/Campbell/SpkrRec/. We hope to increase the awareness and use of these standard corpora and corresponding evaluation procedures throughout the speaker recognition community.

READ LESS

Summary

Corpora for the evaluation of speaker recognition systems

Publications

Refine Results

By

Summary

Summary

Summary

Summary

Summary

Summary

Summary

Summary

Summary

Summary

Summary

Summary

Summary

Summary

Summary

Summary

Summary

Summary

Summary

Summary

Showing Results