Publications

Refine Results

(Filters Applied) Clear All

A scalable phonetic vocoder framework using joint predictive vector quantization of MELP parameters

Author:
Published in:
Proc. IEEE Int. Conf. on Acoustics, Speech and Signal Processing, Speech and Language Processing, ICASSP, 14-19 May 2006, pp. 705-708.

Summary

We present the framework for a Scalable Phonetic Vocoder (SPV) capable of operating at bit rates from 300 - 1100 bps. The underlying system uses an HMM-based phonetic speech recognizer to estimate the parameters for MELP speech synthesis. We extend this baseline technique in three ways. First, we introduce the concept of predictive time evolution to generate a smoother path for the synthesizer parameters, and show that it improves speech quality. Then, since the output speech from the phonetic vocoder is still limited by such low bit rates, we propose a scalable system where the accuracy of the MELP parameters is increased by vector quantizing the error signal between the true and phonetic-estimated MELP parameters. Finally, we apply an extremely flexible technique for exploiting correlations in these parameters over time, which we call Joint Predictive Vector Quantization (JPVQ).We show that significant quality improvement can be attained by adding as few as 400 bps to the baseline phonetic vocoder using JPVQ. The resulting SPV system provides a flexible platform for adjusting the phonetic vocoder bit rate and speech quality.
READ LESS

Summary

We present the framework for a Scalable Phonetic Vocoder (SPV) capable of operating at bit rates from 300 - 1100 bps. The underlying system uses an HMM-based phonetic speech recognizer to estimate the parameters for MELP speech synthesis. We extend this baseline technique in three ways. First, we introduce the...

READ MORE

SVM based speaker verification using a GMM supervector kernel and NAP variability compensation

Published in:
Proc. IEEE Int. Conf. on Acoustics, Speech and Signal Processing, Speech and Language Processing, ICASSP, Vol. 1, 14-19 May 2006, pp. 97-100.

Summary

Gaussian mixture models with universal backgrounds (UBMs) have become the standard method for speaker recognition. Typically, a speaker model is constructed by MAP adaptation of the means of the UBM. A GMM supervector is constructed by stacking the means of the adapted mixture components. A recent discovery is that latent factor analysis of this GMM supervector is an effective method for variability compensation. We consider this GMM supervector in the context of support vector machines. We construct a support vector machine kernel using the GMM supervector. We show similarities based on this kernel between the method of SVM nuisance attribute projection (NAP) and the recent results in latent factor analysis. Experiments on a NIST SRE 2005 corpus demonstrate the effectiveness of the new technique.
READ LESS

Summary

Gaussian mixture models with universal backgrounds (UBMs) have become the standard method for speaker recognition. Typically, a speaker model is constructed by MAP adaptation of the means of the UBM. A GMM supervector is constructed by stacking the means of the adapted mixture components. A recent discovery is that latent...

READ MORE

Evaluation of proposed changes to the ACAS modified tau calculation

Author:
Published in:
Int. Civil Aviation Organization Aeronautical Surveillance Panel Working Group, 1 May 2006.

Summary

Modified tau is a parameter computed by ACAS to estimate the earliest time at which a collision could occur should an intruder aircraft accelerate toward the own aircraft. A concern with the modified tau calculation has been raised in a class of encounters where intruders are already close and converging slowly. In these problem cases, ACAS may induce a Near Mid-Air Collision by generating RAs with inappropriate timing or initial sense or failing to reverse sense when necessary. Performance in some problem encounters is greatly improved when using several proposed changes to the modified tau equations. These changes are outside CP112E, which focuses only on RA reversals. Although changes to modified tau resolve some problem encounters, aggregate risk-ratio results do not support implementing the existing proposals. There remains a concern about mid-air collision risk due to vulnerability in the existing modified tau equations, yet a robust solution to the problem has not been developed.
READ LESS

Summary

Modified tau is a parameter computed by ACAS to estimate the earliest time at which a collision could occur should an intruder aircraft accelerate toward the own aircraft. A concern with the modified tau calculation has been raised in a class of encounters where intruders are already close and converging...

READ MORE

Update on the analysis of ACAS performance on Global Hawk

Author:
Published in:
Int. Civil Aviation Organization Aeronautical Surveillance Panel Working Group, 1 May 2006.

Summary

Initial results are presented from a Lincoln Laboratory study of ACAS performance on the Global Hawk UAV. The study has been applying the process outlined in the ICAO ACAS Manual which involves developing UAV airspace encounter models and running fast-time Monte Carlo simulations of encounters. ACAS performance was examined in conventional aircraft vs. conventional aircraft, conventional aircraft vs. non-ACAS Global Hawk, and conventional aircraft vs. ACAS-equipped Global Hawk cases. The existing ICAO and ACASA encounter models were modified to reflect Global Hawk flight characteristics. ACAS performance on Global Hawk was also assessed parametrically across reaction latencies from 0 - 20 s. Global Hawk flight characteristics were shown to have a small but measurable negative impact on collision risk. Assuming no system failures or visual acquisition effects occur, performance with ACAS on Global Hawk is significantly better than without ACAS if response latencies (from the moment an RA is issued to the moment maneuvering begins) are less than 10 s. Performance drops off rapidly at latencies greater than 10 s. The needs for improved airspace models and a more in-depth study of the interaction between visual acquisition and ACAS are noted.
READ LESS

Summary

Initial results are presented from a Lincoln Laboratory study of ACAS performance on the Global Hawk UAV. The study has been applying the process outlined in the ICAO ACAS Manual which involves developing UAV airspace encounter models and running fast-time Monte Carlo simulations of encounters. ACAS performance was examined in...

READ MORE

Support vector machines using GMM supervectors for speaker verification

Published in:
IEEE Signal Process. Lett., Vol. 13, No. 5, May 2006, pp. 308-311.

Summary

Gaussian mixture models (GMMs) have proven extremely successful for text-independent speaker recognition. The standard training method for GMMmodels is to use MAP adaptation of the means of the mixture components based on speech from a target speaker. Recent methods in compensation for speaker and channel variability have proposed the idea of stacking the means of the GMM model to form a GMM mean supervector. We examine the idea of using the GMM supervector in a support vector machine (SVM) classifier. We propose two new SVM kernels based on distance metrics between GMM models. We show that these SVM kernels produce excellent classification accuracy in a NIST speaker recognition evaluation task.
READ LESS

Summary

Gaussian mixture models (GMMs) have proven extremely successful for text-independent speaker recognition. The standard training method for GMMmodels is to use MAP adaptation of the means of the mixture components based on speech from a target speaker. Recent methods in compensation for speaker and channel variability have proposed the idea...

READ MORE

Multifunction phased array radar pulse compression limits

Author:
Published in:
MIT Lincoln Laboratory Report ATC-327

Summary

An active phased array radar with distributed low-peak-power transmit modules requires pulse compression to provide high sensitivity and fine range resolution. A long transmitted pulse, however, has accompanying problems such as a near-range blind zone for the transmitting channel and a loss of other gate data (dead gates) in other channels for a multichannel system. In this report the trade-off between the benefits and costs of pulse compression (lengthening) for multifunction phased array radars (MPARs) are analyzed. Specific results are presented for a three-channel MPAR and a two-channel terminal-area MPAR (TMPAR) that have been proposed as replacement systems for current U.S. civil-sector aircraft anad weather surveillance radar systems. The recommended maximum compression ratio is 130 ofr the MPAR and 80 for the TMPAR. The results are independent of radar peak power and antenna gain, and represent upper bounds. Acutal pulse compression ratios that would be employed are likely to be somewhat less tha these values, based on fulfilling specific sensitivity and scan-time requirements with specific radar physical parameters.
READ LESS

Summary

An active phased array radar with distributed low-peak-power transmit modules requires pulse compression to provide high sensitivity and fine range resolution. A long transmitted pulse, however, has accompanying problems such as a near-range blind zone for the transmitting channel and a loss of other gate data (dead gates) in other...

READ MORE

Support vector machines for speaker and language recognition

Published in:
Comput. Speech Lang., Vol. 20, No. 2-3, April/July 2006, pp. 210-229.

Summary

Support vector machines (SVMs) have proven to be a powerful technique for pattern classification. SVMs map inputs into a high-dimensional space and then separate classes with a hyperplane. A critical aspect of using SVMs successfully is the design of the inner product, the kernel, induced by the high dimensional mapping. We consider the application of SVMs to speaker and language recognition. A key part of our approach is the use of a kernel that compares sequences of feature vectors and produces a measure of similarity. Our sequence kernel is based upon generalized linear discriminants. We show that this strategy has several important properties. First, the kernel uses an explicit expansion into SVM feature space - this property makes it possible to collapse all support vectors into a single model vector and have low computational complexity. Second, the SVM builds upon a simpler mean-squared error classifier to produce a more accurate system. Finally, the system is competitive and complimentary to other approaches, such as Gaussian mixture models (GMMs). We give results for the 2003 NIST speaker and language evaluations of the system and also show fusion with the traditional GMM approach.
READ LESS

Summary

Support vector machines (SVMs) have proven to be a powerful technique for pattern classification. SVMs map inputs into a high-dimensional space and then separate classes with a hyperplane. A critical aspect of using SVMs successfully is the design of the inner product, the kernel, induced by the high dimensional mapping...

READ MORE

Afterpulsing in Geiger-mode avalanche photodiodes for 1.06um wavelength

Summary

We consider the phenomenon of afterpulsing in avalanche photodiodes (APDs) operating in gated and free-running Geiger mode. An operational model of afterpulsing and other noise characteristics of APDs predicts the noise behavior observed in the free-running mode. We also use gated-mode data to investigate possible sources of afterpulsing in these devices. For 30-um-diam, 1.06-um-wavelength InGaAsP/InP APDs operated at 290 K and 4 V overbias, we obtained a dominant trap lifetime of td=0.32 us, a trap energy of 0.11 eV, and a baseline dark count rate 245 kHz.
READ LESS

Summary

We consider the phenomenon of afterpulsing in avalanche photodiodes (APDs) operating in gated and free-running Geiger mode. An operational model of afterpulsing and other noise characteristics of APDs predicts the noise behavior observed in the free-running mode. We also use gated-mode data to investigate possible sources of afterpulsing in these...

READ MORE

Exploiting nonacoustic sensors for speech encoding

Summary

The intelligibility of speech transmitted through low-rate coders is severely degraded when high levels of acoustic noise are present in the acoustic environment. Recent advances in nonacoustic sensors, including microwave radar, skin vibration, and bone conduction sensors, provide the exciting possibility of both glottal excitation and, more generally, vocal tract measurements that are relatively immune to acoustic disturbances and can supplement the acoustic speech waveform. We are currently investigating methods of combining the output of these sensors for use in low-rate encoding according to their capability in representing specific speech characteristics in different frequency bands. Nonacoustic sensors have the ability to reveal certain speech attributes lost in the noisy acoustic signal; for example, low-energy consonant voice bars, nasality, and glottalized excitation. By fusing nonacoustic low-frequency and pitch content with acoustic-microphone content, we have achieved significant intelligibility performance gains using the DRT across a variety of environments over the government standard 2400-bps MELPe coder. By fusing quantized high-band 4-to-8-kHz speech, requiring only an additional 116 bps, we obtain further DRT performance gains by exploiting the ear's insensitivity to fine spectral detail in this frequency region.
READ LESS

Summary

The intelligibility of speech transmitted through low-rate coders is severely degraded when high levels of acoustic noise are present in the acoustic environment. Recent advances in nonacoustic sensors, including microwave radar, skin vibration, and bone conduction sensors, provide the exciting possibility of both glottal excitation and, more generally, vocal tract...

READ MORE

Laser radar imager based on 3D integration of Geiger-mode avalanche photodiodes with two SOI timing circuit layers

Summary

We have developed focal-plane arrays and laser-radar (ladar) imaging systems based on Geiger-mode avalanche photodiodes (APDs) integrated with high-speed all-digital CMOS timing circuits. A Geiger-mode APD produces a digital pulse upon detection of a single photon. This pulse is used to stop a fast digital counter in the pixel circuit, thereby measuring photon arrival time. This "photon-to-digital conversion" yields quantum-limited sensitivity and noiseless readout, enabling high-performance ladar systems. Previously reported focal planes, based on bump bonding or epoxy bonding the APDs to foundry chips, had coarse (100um) pixel spacing and 0.5ns timing quantization.
READ LESS

Summary

We have developed focal-plane arrays and laser-radar (ladar) imaging systems based on Geiger-mode avalanche photodiodes (APDs) integrated with high-speed all-digital CMOS timing circuits. A Geiger-mode APD produces a digital pulse upon detection of a single photon. This pulse is used to stop a fast digital counter in the pixel circuit...

READ MORE