Publications
Multimodal speaker authentication using nonacuostic sensors
Summary
Summary
Many nonacoustic sensors are now available to augment user authentication. Devices such as the GEMS (glottal electromagnetic micro-power sensor), the EGG (electroglottograph), and the P-mic (physiological mic) all have distinct methods of measuring physical processes associated with speech production. A potential exciting aspect of the application of these sensors is...
Auditory signal processing as a basis for speaker recognition
Summary
Summary
In this paper, we exploit models of auditory signal processing at different levels along the auditory pathway for use in speaker recognition. A low-level nonlinear model, at the cochlea, provides accentuated signal dynamics, while a a high-level model, at the inferior colliculus, provides frequency analysis of modulation components that reveals...
2-D processing of speech with application to pitch estimation
Summary
Summary
In this paper, we introduce a new approach to two-dimensional (2-D) processing of the one-dimensional (1-D) speech signal in the time-frequency plane. Specifically, we obtain the shortspace 2-D Fourier transform magnitude of a narrowband spectrogram of the signal and show that this 2-D transformation maps harmonically-related signal components to a...
Speaker verification using text-constrained Gaussian mixture models
Summary
Summary
In this paper we present an approach to close the gap between text-dependent and text-independent speaker verification performance. Text-constrained GMM-UBM systems are created using word segmentations produced by a LVCSR system on conversational speech allowing the system to focus on speaker differences over a constrained set of acoustic units. Results...
Speech enhancement based on auditory spectral change
Summary
Summary
In this paper, an adaptive approach to the enhancement of speech signals is developed based on auditory spectral change. The algorithm is motivated by sensitivity of aural biologic systems to signal dynamics, by evidence that noise is aurally masked by rapid changes in a signal, and by analogies to these...
Speaker recognition from coded speech and the effects of score normalization
Summary
Summary
We investigate the effect of speech coding on automatic speaker recognition when training and testing conditions are matched and mismatched. Experiments used standard speech coding algorithms (GSM, G.729, G.723, MELP) and a speaker recognition system based on Gaussian mixture models adapted from a universal background model. There is little loss...
Speaker recognition from coded speech in matched and mismatched conditions
Summary
Summary
We investigate the effect of speech coding on automatic speaker recognition when training and testing conditions are matched and mismatched. Experiments use standard speech coding algorithms (GSM, G.729, G.723, MELP) and a speaker recognition system based on Gaussian mixture models adapted from a universal background model. There is little loss...
Estimation of handset nonlinearity with application to speaker recognition
Summary
Summary
A method is described for estimating telephone handset nonlinearity by matching the spectral magnitude of the distorted signal to the output of a nonlinear channel model, driven by an undistorted reference. This "magnitude-only" representation allows the model to directly match unwanted speech formants that arise over nonlinear channels and that...
Speaker recognition using G.729 speech codec parameters
Summary
Summary
Experiments in Gaussian-mixture-model speaker recognition from mel-filter bank energies (MFBs) of the G.729 codec all-pole spectral envelope, showed significant performance loss relative to the standard mel-cepstral coefficients of G.729 synthesized (coded) speech. In this paper, we investigate two approaches to recover speaker recognition performance from G.729 parameters, rather than deriving...
Approaches to speaker detection and tracking in conversational speech
Summary
Summary
Two approaches to detecting and tracking speakers in multispeaker audio are described. Both approaches use an adapted Gaussian mixture model, universal background model (GMM-UBM) speaker detection system as the core speaker recognition engine. In one approach, the individual log-likelihood ratio scores, which are produced on a frame-by-frame basis by the...