Publications
Automatic language identification
Summary
Summary
Automatic language identification is the process by which the language of digitized spoken words is recognized by a computer. It is one of several processes in which information is extracted automatically from a speech signal.
Low-bit-rate speech coding
Summary
Summary
Low-bit-rate speech coding, at rates below 4 kb/s, is needed for both communication and voice storage applications. At such low rates, full encoding of the speech waveform is not possible; therefore, low-rate coders rely instead on parametric models to represent only the most perceptually relevant aspects of speech. While there...
Nuisance attribute projection
Summary
Summary
Cross-channel degradation is one of the significant challenges facing speaker recognition systems. We study this problem in the support vector machine (SVM) context and nuisance variable compensation in high-dimensional spaces more generally. We present an approach to nuisance variable compensation by removing nuisance attribute-related dimensions in the SVM expansion space...
Text-independent speaker recognition
Summary
Summary
In this chapter, we focus on the area of text-independent speaker verification, with an emphasis on unconstrained telephone conversational speech. We begin by providing a general likelihood ratio detection task framework to describe the various components in modern text-independent speaker verification systems. We next describe the general hierarchy of speaker...
ILR-based MT comprehension test with multi-level questions
Summary
Summary
We present results from a new Interagency Language Roundtable (ILR) based comprehension test. This new test design presents questions at multiple ILR difficulty levels within each document. We incorporated Arabic machine translation (MT) output from three independent research sites, arbitrarily merging these materials into one MT condition. We contrast the...
A new approach to achieving high-performance power amplifier linearization
Summary
Summary
Digital baseband predistortion (DBP) is not particularly well suited to linearizing wideband power amplifiers (PAs); this is due to the exorbitant price paid in computational complexity. One of the underlying reasons for the computational complexity of DBP is the inherent inefficiency of using a sufficiently deep memory and a high...
Language recognition with word lattices and support vector machines
Summary
Summary
Language recognition is typically performed with methods that exploit phonotactics--a phone recognition language modeling (PRLM) system. A PRLM system converts speech to a lattice of phones and then scores a language model. A standard extension to this scheme is to use multiple parallel phone recognizers (PPRLM). In this paper, we...
Triage framework for resource conservation in a speaker identification system
Summary
Summary
We present a novel framework for triaging (prioritizing and discarding) data to conserve resources for a speaker identification (SID) system. Our work is motivated by applications that require a SID system to process an overwhelming volume of audio data. We design a triage filter whose goal is to conserve recognizer...
The MIT-LL/IBM 2006 speaker recognition system: high-performance reduced-complexity recognition
Summary
Summary
Many powerful methods for speaker recognition have been introduced in recent years--high-level features, novel classifiers, and channel compensation methods. A common arena for evaluating these methods has been the NIST speaker recognition evaluation (SRE). In the NIST SRE from 2002-2005, a popular approach was to fuse multiple systems based upon...
Multisensor dynamic waveform fusion
Summary
Summary
Speech communication is significantly more difficult in severe acoustic background noise environments, especially when low-rate speech coders are used. Non-acoustic sensors, such as radar sensors, vibrometers, and bone-conduction microphones, offer significant potential in these situations. We extend previous work on fixed waveform fusion from multiple sensors to an optimal dynamic...