Publications
Automatic language identification
Summary
Summary
Automatic language identification is the process by which the language of digitized spoken words is recognized by a computer. It is one of several processes in which information is extracted automatically from a speech signal.
Low-bit-rate speech coding
Summary
Summary
Low-bit-rate speech coding, at rates below 4 kb/s, is needed for both communication and voice storage applications. At such low rates, full encoding of the speech waveform is not possible; therefore, low-rate coders rely instead on parametric models to represent only the most perceptually relevant aspects of speech. While there...
Nuisance attribute projection
Summary
Summary
Cross-channel degradation is one of the significant challenges facing speaker recognition systems. We study this problem in the support vector machine (SVM) context and nuisance variable compensation in high-dimensional spaces more generally. We present an approach to nuisance variable compensation by removing nuisance attribute-related dimensions in the SVM expansion space...
Text-independent speaker recognition
Summary
Summary
In this chapter, we focus on the area of text-independent speaker verification, with an emphasis on unconstrained telephone conversational speech. We begin by providing a general likelihood ratio detection task framework to describe the various components in modern text-independent speaker verification systems. We next describe the general hierarchy of speaker...
ILR-based MT comprehension test with multi-level questions
Summary
Summary
We present results from a new Interagency Language Roundtable (ILR) based comprehension test. This new test design presents questions at multiple ILR difficulty levels within each document. We incorporated Arabic machine translation (MT) output from three independent research sites, arbitrarily merging these materials into one MT condition. We contrast the...
A new approach to achieving high-performance power amplifier linearization
Summary
Summary
Digital baseband predistortion (DBP) is not particularly well suited to linearizing wideband power amplifiers (PAs); this is due to the exorbitant price paid in computational complexity. One of the underlying reasons for the computational complexity of DBP is the inherent inefficiency of using a sufficiently deep memory and a high...
Language recognition with word lattices and support vector machines
Summary
Summary
Language recognition is typically performed with methods that exploit phonotactics--a phone recognition language modeling (PRLM) system. A PRLM system converts speech to a lattice of phones and then scores a language model. A standard extension to this scheme is to use multiple parallel phone recognizers (PPRLM). In this paper, we...
An evaluation of audio-visual person recognition on the XM2VTS corpus using the Lausanne protocols
Summary
Summary
A multimodal person recognition architecture has been developed for the purpose of improving overall recognition performance and for addressing channel-specific performance shortfalls. This multimodal architecture includes the fusion of a face recognition system with the MIT/LLGMM/UBM speaker recognition architecture. This architecture exploits the complementary and redundant nature of the face...
Robust speaker recognition with cross-channel data: MIT-LL results on the 2006 NIST SRE auxiliary microphone task
Summary
Summary
One particularly difficult challenge for cross-channel speaker verification is the auxiliary microphone task introduced in the 2005 and 2006 NIST Speaker Recognition Evaluations, where training uses telephone speech and verification uses speech from multiple auxiliary microphones. This paper presents two approaches to compensate for the effects of auxiliary microphones on...
Multisensor dynamic waveform fusion
Summary
Summary
Speech communication is significantly more difficult in severe acoustic background noise environments, especially when low-rate speech coders are used. Non-acoustic sensors, such as radar sensors, vibrometers, and bone-conduction microphones, offer significant potential in these situations. We extend previous work on fixed waveform fusion from multiple sensors to an optimal dynamic...