Publications
Exploring the impact of advanced front-end processing on NIST speaker recognition microphone tasks
Summary
Summary
The NIST speaker recognition evaluation (SRE) featured microphone data in the 2005-2010 evaluations. The preprocessing and use of this data has typically been performed with telephone bandwidth and quantization. Although this approach is viable, it ignores the richer properties of the microphone data-multiple channels, high-rate sampling, linear encoding, ambient noise...
Sinewave representations of nonmodality
Summary
Summary
Regions of nonmodal phonation, exhibiting deviations from uniform glottal-pulse periods and amplitudes, occur often and convey information about speaker- and linguistic-dependent factors. Such waveforms pose challenges for speech modeling, analysis/synthesis, and processing. In this paper, we investigate the representation of nonmodal pulse trains as a sum of harmonically-related sinewaves with...
USSS-MITLL 2010 human assisted speaker recognition
Summary
Summary
The United States Secret Service (USSS) teamed with MIT Lincoln Laboratory (MIT/LL) in the US National Institute of Standards and Technology's 2010 Speaker Recognition Evaluation of Human Assisted Speaker Recognition (HASR). We describe our qualitative and automatic speaker comparison processes and our fusion of these processes, which are adapted from...
Sinewave parameter estimation using the fast Fan-Chirp Transform
Summary
Summary
Sinewave analysis/synthesis has long been an important tool for audio analysis, modification and synthesis [1]. The recently introduced Fan-Chirp Transform (FChT) [2,3] has been shown to improve the fidelity of sinewave parameter estimates for a harmonic audio signal with rapid frequency modulation [4]. A fast version of the FChT [3]...
Adaptive short-time analysis-synthesis for speech enhancement
Summary
Summary
In this paper we propose a multiresolution short-time analysis method for speech enhancement. It is well known that fixed resolution methods such as the traditional short-time Fourier transform do not generally match the time-frequency structure of the signal being analyzed resulting in poor estimates of the speech and noise spectra...
Sinewave analysis/synthesis based on the fan-chirp transform
Summary
Summary
There have been numerous recent strides at making sinewave analysis consistent with time-varying sinewave models. This is particularly important in high-frequency speech regions where harmonic frequency modulation (FM) can be significant. One notable approach is through the Fan Chirp transform that provides a set of FM-sinewave basis functions consistent with...
An evaluation of audio-visual person recognition on the XM2VTS corpus using the Lausanne protocols
Summary
Summary
A multimodal person recognition architecture has been developed for the purpose of improving overall recognition performance and for addressing channel-specific performance shortfalls. This multimodal architecture includes the fusion of a face recognition system with the MIT/LLGMM/UBM speaker recognition architecture. This architecture exploits the complementary and redundant nature of the face...
Robust speaker recognition with cross-channel data: MIT-LL results on the 2006 NIST SRE auxiliary microphone task
Summary
Summary
One particularly difficult challenge for cross-channel speaker verification is the auxiliary microphone task introduced in the 2005 and 2006 NIST Speaker Recognition Evaluations, where training uses telephone speech and verification uses speech from multiple auxiliary microphones. This paper presents two approaches to compensate for the effects of auxiliary microphones on...
Beyond cepstra: exploiting high-level information in speaker recognition
Summary
Summary
Traditionally speaker recognition techniques have focused on using short-term, low-level acoustic information such as cepstra features extracted over 20-30 ms windows of speech. But speech is a complex behavior conveying more information about the speaker than merely the sounds that are characteristic of his vocal apparatus. This higher-level information includes...
Fusing high- and low-level features for speaker recognition
Summary
Summary
The area of automatic speaker recognition has been dominated by systems using only short-term, low-level acoustic information, such as cepstral features. While these systems have produced low error rates, they ignore higher levels of information beyond low-level acoustics that convey speaker information. Recently published works have demonstrated that such high-level...