Publications
Speaker verification using adapted Gaussian mixture models
Summary
Summary
In this paper we describe the major elements of MIT Lincoln Laboratory's Gaussian mixture model (GMM)-based speaker verification system used successfully in several NIST Speaker Recognition Evaluations (SREs). The system is built around the likelihood ratio test for verification, using simple but effective GMMs for likelihood functions, a universal background...
Estimation of modulation based on FM-to-AM transduction: two-sinusoid case
Summary
Summary
A method is described for estimating the amplitude modulation (AM) and the frequency modulation (FM) of the components of a signal that consists of two AM-FM sinusoids. The approach is based on the transduction of FM to AM that occurs whenever a signal of varying frequency passes through a filter...
Shunting networks for multi-band AM-FM decomposition
Summary
Summary
We describe a transduction-based, neurodynamic approach to estimating the amplitude-modulated (AM) and frequency-modulated (FM) components of a signal. We show that the transduction approach can be realized as a bank of constant-Q bandpass filters followed by envelope detectors and shunting neural networks, and the resulting dynamical system is capable of...
Speaker and language recognition using speech codec parameters
Summary
Summary
In this paper, we investigate the effect of speech coding on speaker and language recognition tasks. Three coders were selected to cover a wide range of quality and bit rates: GSM at 12.2 kb/s, G.729 at 8 kb/s, and G.723.1 at 5.3 kb/s. Our objective is to measure recognition performance...
Modeling of the glottal flow derivative waveform with application to speaker identification
Summary
Summary
An automatic technique for estimating and modeling the glottal flow derivative source waveform from speech, and applying the model parameters to speaker identification, is presented. The estimate of the glottal flow derivative is decomposed into coarse structure, representing the general flow shape, and fine structure, comprising aspiration and other perturbations...
Implications of glottal source for speaker and dialect identification
Summary
Summary
In this paper we explore the importance of speaker specific information carried in the glottal source. We time align utterances of two speakers speaking the same sentence from the TIMIT database of American English. We then extract the glottal flow derivative from each speaker and interchange them. Through time alignment...
'Perfect reconstruction' time-scaling filterbanks
Summary
Summary
A filterbank-based method of time-scale modification is analyzed for elemental signals including clicks, sines, and AM-FM sines. It is shown that with the use of some basic properties of linear systems, as well as FM-to-AM filter transduction, "perfect reconstruction" time-scaling filterbanks can be constructed for these elemental signal classes under...
AM-FM separation using shunting neural networks
Summary
Summary
We describe an approach to estimating the amplitude-modulated (AM) and frequency-modulated (FM) components of a signal. Any signal can be written as the product of an AM component and an FM component. There have been several approaches to solving the AM-FM estimation problem described in the literature. Popular methods include...
Magnitude-only estimation of handset nonlinearity with application to speaker recognition
Summary
Summary
A method is described for estimating telephone handset nonlinearity by matching the spectral magnitude of the distorted signal to the output of a nonlinear channel model, driven by an undistorted reference. The "magnitude-only" representation allows the model to directly match unwanted speech formants that arise over nonlinear channels and that...
Audio signal processing based on sinusoidal analysis/synthesis
Summary
Summary
Based on a sinusoidal model, an analysis/synthesis technique is developed that characterizes audio signals, such as speech and music, in terms of the amplitudes, frequencies, and phases of the component sine waves. These parameters are estimated by applying a peak-picking algorithm to the short-time Fourier transform of the input waveform...