Publications

Refine Results

(Filters Applied) Clear All

Robust text-independent speaker identification using Gaussian mixture speaker models

Published in:
IEEE Trans. Speech Audio Process., Vol. 3, No. 1, January 1995, pp. 72-83.

Summary

This paper introduces and motivates the use of Gaussian mixture models (GMM) for robust text-independent speaker identification. The individual Gaussian components of a GMM are shown to represent some general speaker-dependent spectral shapes that are effective for modeling speaker identify. The focus of this work is on applications which require high identification rates using short utterance from unconstrained conversational speech and robustness to degradations produced by transmission over a telephone channel. A complete experimental evaluation of the Gaussian mixture speaker model is conducted on a 49 speaker, conversational telephone speech database. The experiments examine algorithmic issues (initializations, variance limiting, model order selection), spectral variability robustness techniques, large population performance, and comparisons to other speaker modeling techniques (uni-modal Gaussian, VQ codebook, tied Gaussian mixture, and radial basis functions). The Gaussian mixture speaker model attains 96.8% identification accuracy using 5 second clean speech utterances and 80.8% accuracy using 15 second telephone speech utterances with a 49 speaker population and is shown to outperform the other speaker modeling techniques on an identical 16 speaker telephone speech task.
READ LESS

Summary

This paper introduces and motivates the use of Gaussian mixture models (GMM) for robust text-independent speaker identification. The individual Gaussian components of a GMM are shown to represent some general speaker-dependent spectral shapes that are effective for modeling speaker identify. The focus of this work is on applications which require...

READ MORE

Sinusoidal coding

Published in:
Chapter 4 in Speech Coding and Synthesis, Elsevier Science Publishers, 1995, pp. 121-173.

Summary

This chapter summarizes the sinewave-based pitch extractor, and the high-order all-pole modelling techniques that provided the basis for the multirate Sinusoidal Transform Coder and its application to multi-speaker conferencing.
READ LESS

Summary

This chapter summarizes the sinewave-based pitch extractor, and the high-order all-pole modelling techniques that provided the basis for the multirate Sinusoidal Transform Coder and its application to multi-speaker conferencing.

READ MORE

Speaker identification and verification using Gaussian mixture speaker models

Published in:
Speech Commun., Vol. 17, 1995, pp. 91-108.

Summary

This paper presents high performance speaker identification and verification systems based on Gaussian mixture speaker models: robust, statistically based representations of speaker identification. The identification system is a maximum likelihood classifier and the verification system is a likelihood ratio hypothesis tester using background speaker normalization. The systems are evaluated on four publically available speech databases: TIMIT, NTIMIT, Switchboard and YOHO. The different levels of degradation and variabilities found in these databases allow the examination of system performance for different task domains. Constraints on the speech range from vocabulary-dependent to extemporaneous and speech quality varies from near-ideal, clean speech to noisy, telephone speech. Closed set identification accuracies on the 630 speaker TIMIT and NTIMIT databases were 99.5% and 60.7% respectively. On a 113 speaker population from the Switchboard database the identification accuracy was 82.8%. Global threshold equal error rates of 0.24%, 7.19%, 5.15% and 0.51% were obtained in verification experiments on the TIMIT, NTIMIT, Switchboard and YOHO databases, respectively.
READ LESS

Summary

This paper presents high performance speaker identification and verification systems based on Gaussian mixture speaker models: robust, statistically based representations of speaker identification. The identification system is a maximum likelihood classifier and the verification system is a likelihood ratio hypothesis tester using background speaker normalization. The systems are evaluated on...

READ MORE

Optimum time-varying FIR filter designs for the Airport Surveillance Radar wind shear processor

Published in:
MIT Lincoln Laboratory Report ATC-191

Summary

We have developed new design algorithms for finite impulse response (FIR) filters that compensate for arbitrary input spacing and that allow for arbitrary group delay specification. The potential of these new designs to work with the ASR-9 staggered pulse spacing is examined in the context of the ASR-9 wind-shear processor (WSP). Benefits derived from the new designs include an improved (optimal) stopband design, an increased yield in pulse samples for moments estimation, and the retention of pulse-stagger phase information, which can be used for velocity dealiasing. These improvements are demonstrated using simulated and test-bed data, the latter acquired during 1991/1992 Orlando operations. Filter utilization, in the context of a pre-existing adaptive selection scheme (1) and the Orlando (FL) clutter environment, is examined using the new filters, and areas for improvement are identified.
READ LESS

Summary

We have developed new design algorithms for finite impulse response (FIR) filters that compensate for arbitrary input spacing and that allow for arbitrary group delay specification. The potential of these new designs to work with the ASR-9 staggered pulse spacing is examined in the context of the ASR-9 wind-shear processor...

READ MORE

Safety analysis of the Traffic Information Service

Published in:
MIT Lincoln Laboratory Report ATC-226

Summary

Traffic Information Service (TIS) is a Mode S data link application being developed for use by general aviation (GA) pilots. Its purpose is to provide a low-cost means of assisting the pilot in visual acquisition of nearby aircraft. The service provides two functions: traffic alerting and threat assessment. These functions are also performed by the Traffic Alert and Collision Avoidance System (TCAS). The purpose of this report is to evaluate the effectiveness and safety of TIS in relation to that of TCAS I. The analysis begins with a conceptual review of Andrews' statistical model of visual acquisition. Next, the surveillance systems and threat-detection logic of TIS and TCAS I are reviewed. Results of a Monte Carlo simulation that modeled the threat-assessment performance of TCAS I and TIS are also presented. The analysis supports the conclusion that, because of the high degree of similarity between TIS and TCAS I, TIS is a safe and effective means of assisting the pilot in visual acquisition of air traffic.
READ LESS

Summary

Traffic Information Service (TIS) is a Mode S data link application being developed for use by general aviation (GA) pilots. Its purpose is to provide a low-cost means of assisting the pilot in visual acquisition of nearby aircraft. The service provides two functions: traffic alerting and threat assessment. These functions...

READ MORE

Obtaining low sidelobes using non-linear FM pulse compression

Author:
Published in:
MIT Lincoln Laboratory Report ATC-223

Summary

Airport Surveillance Radar (ASR) manufacturers are proposing the use of non-linear FM pulse compression in their all solid state radars. However there is concern that the use of pulse compression will limit the radar's performance. High range sidelobes can cause poor performance in both target and weather detection. The theory of nonlinear FM pulse compression is derived along with a method of minimizing the sidelobes using a minimum mean square error (MMSE) technique. The results of a computer program using the MMSE technique show that very low sidelobe levels of more than 100 dB down may be achieved. These very low sidelobes are affected by filter misalignment, target Doppler, and by transmitter phase errors or stability. Curves are presented demonstrating these effects. We also show how filter misalignment can be corrected by receiver filtering. The methods presented here are general enough to be used to assess the performance of proposed non-linear FM waveform radars.
READ LESS

Summary

Airport Surveillance Radar (ASR) manufacturers are proposing the use of non-linear FM pulse compression in their all solid state radars. However there is concern that the use of pulse compression will limit the radar's performance. High range sidelobes can cause poor performance in both target and weather detection. The theory...

READ MORE

Energy onset times for speaker identification

Published in:
IEEE Signal Process. Lett., Vol. 1, No. 11, November 1994, pp. 160-162.

Summary

Onset times of resonant energy pulses are measured with the high-resolution Teager operator and used as features in the Reynolds Gaussian-mixture speaker identification algorithm. Feature sets are constructed with primary pitch and secondary pulse locations derived from low and high speech formants. Preliminary testing was performed with a confusable 40-speaker subset from the NTIMIT (telephone channel) database. Speaker identification improved from 55 to 70% correct classification when the full set of new resonant energy-based features were added as an independent stream to conventional mel-cepstra.
READ LESS

Summary

Onset times of resonant energy pulses are measured with the high-resolution Teager operator and used as features in the Reynolds Gaussian-mixture speaker identification algorithm. Feature sets are constructed with primary pitch and secondary pulse locations derived from low and high speech formants. Preliminary testing was performed with a confusable 40-speaker...

READ MORE

GPS-squitter experimental results

Published in:
13th AIAA/IEEE Digital Avionics Systems Conf., 30 October - 3 November 1994, pp. 521-527.

Summary

GPS-Squitter is a system concept that merges the capabilities of Automatic Dependent Surveillance (ADS) and the Mode S beacon radar. The result is an integrated concept for seamless surveillance and data link that permits equipped aircraft to participate in ADS or beacon ground environments. This offers many possibilities for transition from beacon to ADS-based surveillance. This paper briefly defines the GPS-Squitter concept and its principal applications. The thrust of the paper is the presentation of surface and airborne surveillance measurements made at Hanscom Field in Bedford, Massachusetts and at the Logan International Airport in Boston. In each case the measurements show the excellent surveillance performance provided by this concept.
READ LESS

Summary

GPS-Squitter is a system concept that merges the capabilities of Automatic Dependent Surveillance (ADS) and the Mode S beacon radar. The result is an integrated concept for seamless surveillance and data link that permits equipped aircraft to participate in ADS or beacon ground environments. This offers many possibilities for transition...

READ MORE

Formant AM-FM for speaker identification

Published in:
Proc. IEEE-SP Int. Symp. on Time-Frequency and Time-Scale Analysis, 25-28 October 1994, pp. 608-611.

Summary

The performance of systems for speaker identification (SID) can be quite good with clean speech, though much lower with degraded speech. Thus it is useful to search for new features for SID, particularly features that are robust over a degraded channel. This paper investigates features that are robust over a degraded channel. This paper investigates features that are based on amplitude and frequency modulations of speech formants. Such modulations are measured using a high-resolution energy operator and related algorithms for recovering amplitude and frequency from an AM-FM signal. When these features are added to traditional features using an existing SID system with a telephone speech database, SID performance improved by as much as 15%. Energy onset time measurements that yielded improved SID performance are also discussed.
READ LESS

Summary

The performance of systems for speaker identification (SID) can be quite good with clean speech, though much lower with degraded speech. Thus it is useful to search for new features for SID, particularly features that are robust over a degraded channel. This paper investigates features that are robust over a...

READ MORE

Summer 1992 Terminal area-Local Analysis and Prediction System (T-LAPS) evaluation

Published in:
MIT Lincoln Laboratory Report ATC-218

Summary

The Integrated Terminal Weather System (ITWS) is a development program initiated by the Federal Administration (FAA) to produce a fully automated, integrated terminal weather information system to improve the safety, efficiency and capacity of terminal area aviation operations. The ITWS will acquire data from FAA and National Weather Service sensors as well as from aircraft in flight in the terminal area. The ITWS will provide Air Traffic personnel with products that are immediately usable without further meteorological interpretation. Among the products are current terminal area weather, short-term (0-30 minute) predictions of significant weather phenomena, and the Terminal Winds product. The terminal winds product is the component of the ITWS which produces estimates of the horizontal winds on a three dimensional grid of points encompassing an airport terminal region. It uses information from a variety of sensors, including Doppler weather radars. In 1992, an operational test of an initial prototype Terminal Winds system was conducted at the MIT Lincoln Laboratory testbed in Orlando, FL. This report describes our evalution of the initial Terminal Winds prototype.
READ LESS

Summary

The Integrated Terminal Weather System (ITWS) is a development program initiated by the Federal Administration (FAA) to produce a fully automated, integrated terminal weather information system to improve the safety, efficiency and capacity of terminal area aviation operations. The ITWS will acquire data from FAA and National Weather Service sensors...

READ MORE