Publications

Refine Results

(Filters Applied) Clear All

Acoustic, phonetic, and discriminative approaches to automatic language identification

Summary

Formal evaluations conducted by NIST in 1996 demonstrated that systems that used parallel banks of tokenizer-dependent language models produced the best language identification performance. Since that time, other approaches to language identification have been developed that match or surpass the performance of phone-based systems. This paper describes and evaluates three techniques that have been applied to the language identification problem: phone recognition, Gaussian mixture modeling, and support vector machine classification. A recognizer that fuses the scores of three systems that employ these techniques produces a 2.7% equal error rate (EER) on the 1996 NIST evaluation set and a 2.8% EER on the NIST 2003 primary condition evaluation set. An approach to dealing with the problem of out-of-set data is also discussed.
READ LESS

Summary

Formal evaluations conducted by NIST in 1996 demonstrated that systems that used parallel banks of tokenizer-dependent language models produced the best language identification performance. Since that time, other approaches to language identification have been developed that match or surpass the performance of phone-based systems. This paper describes and evaluates three...

READ MORE

Fusing high- and low-level features for speaker recognition

Summary

The area of automatic speaker recognition has been dominated by systems using only short-term, low-level acoustic information, such as cepstral features. While these systems have produced low error rates, they ignore higher levels of information beyond low-level acoustics that convey speaker information. Recently published works have demonstrated that such high-level information can be used successfully in automatic speaker recognition systems by improving accuracy and potentially increasing robustness. Wide ranging high-level-feature-based approaches using pronunciation models, prosodic dynamics, pitch gestures, phone streams, and conversational interactions were explored and developed under the SuperSID project at the 2002 JHU CLSP Summer Workshop (WS2002): http://www.clsp.jhu.edu/ws2002/groups/supersid/. In this paper, we show how these novel features and classifiers provide complementary information and can be fused together to drive down the equal error rate on the 2001 NIST Extended Data Task to 0.2%-a 71% relative reduction in error over the previous state of the art.
READ LESS

Summary

The area of automatic speaker recognition has been dominated by systems using only short-term, low-level acoustic information, such as cepstral features. While these systems have produced low error rates, they ignore higher levels of information beyond low-level acoustics that convey speaker information. Recently published works have demonstrated that such high-level...

READ MORE

Person authentication by voice: a need for caution

Published in:
8th European Conf. on Speech Communication and Technology, EUROSPEECH, 1-4 September 2003.

Summary

Because of recent events and as members of the scientific community working in the field of speech processing, we feel compelled to publicize our views concerning the possibility of identifying or authenticating a person from his or her voice. The need for a clear and common message was indeed shown by the diversity of information that has been circulating on this matter in the media and general public over the past year. In a press release initiated by the AFCP and further elaborated in collaboration with the SpLC ISCA-SIG, the two groups herein discuss and present a summary of the current state of scientific knowledge and technological development in the field of speaker recognition, in accessible wording for nonspecialists. Our main conclusion is that, despite the existence of technological solutions to some constrained applications, at the present time, there is no scientific process that enables one to uniquely characterize a person's voice or to identify with absolute certainty an individual from his or her voice.
READ LESS

Summary

Because of recent events and as members of the scientific community working in the field of speech processing, we feel compelled to publicize our views concerning the possibility of identifying or authenticating a person from his or her voice. The need for a clear and common message was indeed shown...

READ MORE

Integration of speaker recognition into conversational spoken dialogue systems

Summary

In this paper we examine the integration of speaker identification/verification technology into two dialogue systems developed at MIT: the Mercury air travel reservation system and the Orion task delegation system. These systems both utilize information collected from registered users that is useful in personalizing the system to specific users and that must be securely protected from imposters. Two speaker recognition systems, the MIT Lincoln Laboratory text independent GMM based system and the MIT Laboratory for Computer Science text-constrained speaker-adaptive ASR-based system, are evaluated and compared within the context of these conversational systems.
READ LESS

Summary

In this paper we examine the integration of speaker identification/verification technology into two dialogue systems developed at MIT: the Mercury air travel reservation system and the Orion task delegation system. These systems both utilize information collected from registered users that is useful in personalizing the system to specific users and...

READ MORE

Model compression for GMM based speaker recognition systems

Published in:
EUROSPEECH 2003, 1-4 September 2003.

Summary

For large-scale deployments of speaker verification systems models size can be an important issue for not only minimizing storage requirements but also reducing transfer time of models over networks. Model size is also critical for deployments to small, portable devices. In this paper we present a new model compression technique for Gaussian Mixture Model (GMM) based speaker recognition systems. For GMM systems using adaptation from a background model, the compression technique exploits the fact that speaker models are adapted from a single speaker-independent model and not all parameters need to be stored. We present results on the 2002 NIST speaker recognition evaluation cellular telephone corpus and show that the compression technique provides a good tradeoff of compression ratio to performance loss. We are able to achieve a 56:1 compression (624KB -> 11KB) with only a 3.2% relative increase in EER (9.1% -> 9.4%).
READ LESS

Summary

For large-scale deployments of speaker verification systems models size can be an important issue for not only minimizing storage requirements but also reducing transfer time of models over networks. Model size is also critical for deployments to small, portable devices. In this paper we present a new model compression technique...

READ MORE

Measuring the readability of automatic speech-to-text transcripts

Summary

This paper reports initial results from a novel psycholinguistic study that measures the readability of several types of speech transcripts. We define a four-part figure of merit to measure readability: accuracy of answers to comprehension questions, reaction-time for passage reading, reaction-time for question answering and a subjective rating of passage difficulty. We present results from an experiment with 28 test subjects reading transcripts in four experimental conditions.
READ LESS

Summary

This paper reports initial results from a novel psycholinguistic study that measures the readability of several types of speech transcripts. We define a four-part figure of merit to measure readability: accuracy of answers to comprehension questions, reaction-time for passage reading, reaction-time for question answering and a subjective rating of passage...

READ MORE

An examination of wind shear alert integration at the Dallas/Ft. Worth International Airport (DFW)

Published in:
MIT Lincoln Laboratory Report ATC-309

Summary

The Dallas / Fort Worth International Airport (DFW) is one of the four demonstration system sites for the Integrated Terminal Weather System (ITWS). One of the primary benefits of the ITWS is a suite of algorithms that utilize data from the Terminal Doppler Weather Radar (TDWR) to generate wind shear alerts. DFW also benefits from a Network Expansion of the Low-Level Wind Shear Advisory System (LLWAS-NE). The LLWAS-NE generated alerts are integrated with the radar-based alerts in ITWS to provide Air Traffic Control (ATC) with a comprehensive set of alert information. This study examines the integrated DFW wind shear alerts with emphasis on circumstances in which the detection performance of the TDWR-based wind shear algorithms was poor. Specific detection problems occur in the following situations: when wind shear events over the airport are aligned along a radial to the TDWR, during "non-traditional" wind shear events, when severe signal attenuation occurs during heavy precipitation over the TDWR radar site, and because of excessive TDWR clutter-residue editing over the airport. In all of the cases examined, the LLWAS-NE issued alerts to ATC that would have otherwise gone unreported.
READ LESS

Summary

The Dallas / Fort Worth International Airport (DFW) is one of the four demonstration system sites for the Integrated Terminal Weather System (ITWS). One of the primary benefits of the ITWS is a suite of algorithms that utilize data from the Terminal Doppler Weather Radar (TDWR) to generate wind shear...

READ MORE

Range-velocity ambiguity mitigation schemes for the enhanced Terminal Doppler Weather Radar

Published in:
37th Int. Conf. on Radar Meteorology, 6-12 August 2003.

Summary

The Terminal Doppler Weather Radar (TDWR) radar data acquisition (RDA) subsystem is being replaced as part of a broader FAA program to improve the supportability of the system. An engineering prototype RDA is under development that will provide a modern, open-systems hardware platform and standards-compliant software. The new platform also provides an opportunity to insert algorithms to improve the quality of existing base data products, as well as support future enhancements to the aviation weather services provided by TDWR. There are several outstanding data quality issues with the TDWR. In this paper, we focus on mitigation schemes for the range-velocity ambiguity problem that is especially severe for C-band weather radars such as the TDWR.
READ LESS

Summary

The Terminal Doppler Weather Radar (TDWR) radar data acquisition (RDA) subsystem is being replaced as part of a broader FAA program to improve the supportability of the system. An engineering prototype RDA is under development that will provide a modern, open-systems hardware platform and standards-compliant software. The new platform also...

READ MORE

High-fill-factor, burst-frame-rate charge-coupled device

Published in:
SPIE Vol. 5210, Ultrahigh- and High-Speed Photography, Photonics, and Videography, 3-8 August 2003, pp. 95-104.

Summary

A 512x512-element, multi-frame charge-coupled device (CCD) has been developed for collecting four sequential image frames at megahertz rates. To operate at fast frame rates with high sensitivity, the imager uses an electronic shutter technology developed for back-illuminated CCDs. Device-level simulations were done to estimate the CCD collection well spaces for sub-microsecond photoelectron collection times. Also required for the high frame rates were process enhancements that included metal strapping of the polysilicon gate electrodes and a second metal layer. Tests on finished back-illuminated CCD imagers have demonstrated sequential multi-frame capture capability with integration intervals in the hundreds of nanoseconds range.
READ LESS

Summary

A 512x512-element, multi-frame charge-coupled device (CCD) has been developed for collecting four sequential image frames at megahertz rates. To operate at fast frame rates with high sensitivity, the imager uses an electronic shutter technology developed for back-illuminated CCDs. Device-level simulations were done to estimate the CCD collection well spaces for...

READ MORE

Summary of the EO-1 ALI performance during the first 2.5 years on-orbit

Published in:
SPIE Vol. 5151, Earth Observing Systems VIII, 3-8 August 2003, pp. 574-585.

Summary

The Advanced Land Imager (ALI) is a VNIR/SWIR, pushbroom instrument that is flying aboard the Earth Observing-1 (EO-1) spacecraft. Launched on November 21, 2000, the objective of the ALI is to flight validate emerging technologies that can be infused into future land imaging sensors. During the first two and one-half years on-orbit, the performance of the ALI has been evaluated using on-board calibrators and vicarious observations. The results of this evaluation are presented here. The spatial performance of the instrument, derived using stellar, lunar, and bridge observations, is summarized. The radiometric stability of the focal plane and telescope, established using solar, lunar, ground truth, and on-board sources, is also provided.
READ LESS

Summary

The Advanced Land Imager (ALI) is a VNIR/SWIR, pushbroom instrument that is flying aboard the Earth Observing-1 (EO-1) spacecraft. Launched on November 21, 2000, the objective of the ALI is to flight validate emerging technologies that can be infused into future land imaging sensors. During the first two and one-half...

READ MORE