Publications
Speaker verification using text-constrained Gaussian mixture models
Summary
Summary
In this paper we present an approach to close the gap between text-dependent and text-independent speaker verification performance. Text-constrained GMM-UBM systems are created using word segmentations produced by a LVCSR system on conversational speech allowing the system to focus on speaker differences over a constrained set of acoustic units. Results...
Speech enhancement based on auditory spectral change
Summary
Summary
In this paper, an adaptive approach to the enhancement of speech signals is developed based on auditory spectral change. The algorithm is motivated by sensitivity of aural biologic systems to signal dynamics, by evidence that noise is aurally masked by rapid changes in a signal, and by analogies to these...
Speaker recognition from coded speech and the effects of score normalization
Summary
Summary
We investigate the effect of speech coding on automatic speaker recognition when training and testing conditions are matched and mismatched. Experiments used standard speech coding algorithms (GSM, G.729, G.723, MELP) and a speaker recognition system based on Gaussian mixture models adapted from a universal background model. There is little loss...
Speaker recognition from coded speech in matched and mismatched conditions
Summary
Summary
We investigate the effect of speech coding on automatic speaker recognition when training and testing conditions are matched and mismatched. Experiments use standard speech coding algorithms (GSM, G.729, G.723, MELP) and a speaker recognition system based on Gaussian mixture models adapted from a universal background model. There is little loss...
The Lincoln speaker recognition system: NIST EVAL2000
Summary
Summary
This paper presents an overview of the Lincoln Laboratory systems fielded for the 2000 NIST speaker recognition evaluation (SRE00). In addition to the standard one-speaker detection tasks, this year's evaluation, as in 1999, included multi-speaker spokes dealing with detection, tracking and segmentation. The design approach for the Lincoln system in...
Speaker recognition using G.729 speech codec parameters
Summary
Summary
Experiments in Gaussian-mixture-model speaker recognition from mel-filter bank energies (MFBs) of the G.729 codec all-pole spectral envelope, showed significant performance loss relative to the standard mel-cepstral coefficients of G.729 synthesized (coded) speech. In this paper, we investigate two approaches to recover speaker recognition performance from G.729 parameters, rather than deriving...
Approaches to speaker detection and tracking in conversational speech
Summary
Summary
Two approaches to detecting and tracking speakers in multispeaker audio are described. Both approaches use an adapted Gaussian mixture model, universal background model (GMM-UBM) speaker detection system as the core speaker recognition engine. In one approach, the individual log-likelihood ratio scores, which are produced on a frame-by-frame basis by the...
Speaker verification using adapted Gaussian mixture models
Summary
Summary
In this paper we describe the major elements of MIT Lincoln Laboratory's Gaussian mixture model (GMM)-based speaker verification system used successfully in several NIST Speaker Recognition Evaluations (SREs). The system is built around the likelihood ratio test for verification, using simple but effective GMMs for likelihood functions, a universal background...
Speaker and language recognition using speech codec parameters
Summary
Summary
In this paper, we investigate the effect of speech coding on speaker and language recognition tasks. Three coders were selected to cover a wide range of quality and bit rates: GSM at 12.2 kb/s, G.729 at 8 kb/s, and G.723.1 at 5.3 kb/s. Our objective is to measure recognition performance...
Embedded dual-rate sinusoidal transform coding
Summary
Summary
This paper describes the development of a dual-rate Sinusoidal Transformer Coder in which a 2400 b/s coder is embedded as a separate packet in the 4800 b/s bit stream. The underlying coding structure provides the flexibility necessary for multirate speech coding and multimedia applications.