Publications

Refine Results

(Filters Applied) Clear All

Analysis of nonmodal phonation using minimum entropy deconvolution

Published in:
Proc. Int. Conf. on Spoken Language Processing, ICSLP INTERSPEECH, 17-21 September 2006, pp. 1702-1705.

Summary

Nonmodal phonation occurs when glottal pulses exhibit nonuniform pulse-to-pulse characteristics such as irregular spacings, amplitudes, and/or shapes. The analysis of regions of such nonmodality has application to automatic speech, speaker, language, and dialect recognition. In this paper, we examine the usefulness of a technique called minimum-entropy deconvolution, or MED, for the analysis of pulse events in nonmodal speech. Our study presents evidence for both natural and synthetic speech that MED decomposes nonmodal phonation into a series of sharp pulses and a set of mixedphase impulse responses. We show that the estimated impulse responses are quantitatively similar to those in our synthesis model. A hybrid method incorporating aspects of both MED and linear prediction is also introduced. We show preliminary evidence that the hybrid method has benefit over MED alone for composite impulse-response estimation by being more robust to short-time windowing effects as well as a speech aspiration noise component.
READ LESS

Summary

Nonmodal phonation occurs when glottal pulses exhibit nonuniform pulse-to-pulse characteristics such as irregular spacings, amplitudes, and/or shapes. The analysis of regions of such nonmodality has application to automatic speech, speaker, language, and dialect recognition. In this paper, we examine the usefulness of a technique called minimum-entropy deconvolution, or MED, for...

READ MORE

Lincoln Laboratory high-speed solid-state imager technology

Published in:
SPIE Vol. 6279, 27th Int. Congress on High-Speed Photography and Photonics, 17-22 September 2006, 62791K.

Summary

Massachusetts Institute of Technology, Lincoln Laboratory (MIT LL) has been developing both continuous and burst solid-state focal-plane-array technology for a variety of high-speed imaging applications. For continuous imaging, a 128 ¿ 128-pixel charge coupled device (CCD) has been fabricated with multiple output ports for operating rates greater than 10,000 frames per second with readout noise of less than 10 e- rms. An electronic shutter has been integrated into the pixels of the back-illuminated (BI) CCD imagers that give snapshot exposure times of less than 10 ns. For burst imaging, a 5 cm x 5 cm, 512 x 512-element, multi-frame CCD imager that collects four sequential image frames at megahertz rates has been developed for the Los Alamos National Laboratory Dual Axis Radiographic Hydrodynamic Test (DARHT) facility. To operate at fast frame rates with high sensitivity, the imager uses the same electronic shutter technology as the continuously framing 128 x 128 CCD imager. The design concept and test results are described for the burst-frame-rate imager. Also discussed is an evolving solid-state imager technology that has interesting characteristics for creating large-format x-ray detectors with ultra-short exposure times (100 to 300 ps). The detector will consist of CMOS readouts for high speed sampling (tens of picoseconds transistor switching times) that are bump bonded to deep-depletion silicon photodiodes. A 64 x 64-pixel CMOS test chip has been designed, fabricated and characterized to investigate the feasibility of making large-format detectors with short, simultaneous exposure times.
READ LESS

Summary

Massachusetts Institute of Technology, Lincoln Laboratory (MIT LL) has been developing both continuous and burst solid-state focal-plane-array technology for a variety of high-speed imaging applications. For continuous imaging, a 128 ¿ 128-pixel charge coupled device (CCD) has been fabricated with multiple output ports for operating rates greater than 10,000 frames...

READ MORE

Reducing speech coding distortion for speaker identification

Author:
Published in:
Int. Conf. on Spoken Language Processing, ICSLP, 17-21 September 2006.

Summary

In this paper, we investigate the degradation of speaker identification performance due to speech coding algorithms used in digital telephone networks, cellular telephony, and voice over IP. By analyzing the difference between front-end feature vectors derived from coded and uncoded speech in terms of spectral distortion, we are able to quantify this coding degradation. This leads to two novel methods for distortion compensation: codebook and LPC compensation. Both are shown to significantly reduce front-end mismatch, with the second approach providing the most encouraging results. Full experiments using a GMM-UBM speaker ID system confirm the usefulness of both the front-end distortion analysis and the LPC compensation technique.
READ LESS

Summary

In this paper, we investigate the degradation of speaker identification performance due to speech coding algorithms used in digital telephone networks, cellular telephony, and voice over IP. By analyzing the difference between front-end feature vectors derived from coded and uncoded speech in terms of spectral distortion, we are able to...

READ MORE

Pitch-scale modification using the modulated aspiration noise source

Published in:
INTERSPEECH, 17-21 September 2006.

Summary

Spectral harmonic/noise component analysis of spoken vowels shows evidence of noise modulations with peaks in the estimated noise source component synchronous with both the open phase of the periodic source and with time instants of glottal closure. Inspired by this observation of natural modulations and of fullband energy in the aspiration noise source, we develop an alternate approach to high-quality pitch-scale modification of continuous speech. Our strategy takes a dual processing approach, in which the harmonic and noise components of the speech signal are separately analyzed, modified, and re-synthesized. The periodic component is modified using standard modification techniques, and the noise component is handled by modifying characteristics of its source waveform. Since we have modeled an inherent coupling between the periodic and aspiration noise sources, the modification algorithm is designed to preserve the synchrony between temporal modulations of the two sources. The reconstructed modified signal is perceived in informal listening to be natural-sounding and typically reduces artifacts that occur in standard modification techniques.
READ LESS

Summary

Spectral harmonic/noise component analysis of spoken vowels shows evidence of noise modulations with peaks in the estimated noise source component synchronous with both the open phase of the periodic source and with time instants of glottal closure. Inspired by this observation of natural modulations and of fullband energy in the...

READ MORE

Missing feature theory with soft spectral subtraction for speaker verification

Published in:
Interspeech 2006, ICSLP, 17-21 September 2006.

Summary

This paper considers the problem of training/testing mismatch in the context of speaker verification and, in particular, explores the application of missing feature theory in the case of additive white Gaussian noise corruption in testing. Missing feature theory allows for corrupted features to be removed from scoring, the initial step of which is the detection of these features. One method of detection, employing spectral subtraction, is studied in a controlled manner and it is shown that with missing feature compensation the resulting verification performance is improved as long as a minimum number of features remain. Finally, a blending of "soft" spectral subtraction for noise mitigation and missing feature compensation is presented. The resulting performance improves on the constituent techniques alone, reducing the equal error rate by about 15% over an SNR range of 5 - 25 dB.
READ LESS

Summary

This paper considers the problem of training/testing mismatch in the context of speaker verification and, in particular, explores the application of missing feature theory in the case of additive white Gaussian noise corruption in testing. Missing feature theory allows for corrupted features to be removed from scoring, the initial step...

READ MORE

An overview of automatic speaker diarization systems

Published in:
IEEE Trans. Audio, Speech, and Language Processing, Vol. 14, No. 5, September 2006, pp. 1557-1565.

Summary

Audio diarization is the process of annotating an input audio channel with information that attributes (possibly overlapping) temporal regions of signal energy to their specific sources. These sources can include particular speakers, music, background noise sources, and other signal source/channel characteristics. Diarization can be used for helping speech recognition, facilitating the searching and indexing of audio archives, and increasing the richness of automatic transcriptions, making them more readable. In this paper, we provide an overview of the approaches currently used in a key area of audio diarization, namely speaker diarization, and discuss their relative merits and limitations. Performances using the different techniques are compared within the framework of the speaker diarization task in the DARPA EARS Rich Transcription evaluations. We also look at how the techniques are being introduced into real broadcast news systems and their portability to other domains and tasks such as meetings and speaker verification.
READ LESS

Summary

Audio diarization is the process of annotating an input audio channel with information that attributes (possibly overlapping) temporal regions of signal energy to their specific sources. These sources can include particular speakers, music, background noise sources, and other signal source/channel characteristics. Diarization can be used for helping speech recognition, facilitating...

READ MORE

Coherent beam combining of large number of PM fibres in 2-D fibre array

Published in:
Electron. Lett., Vol. 42, No. 18, 31 August 2006, pp. 17-18.

Summary

Coherent combining of a record 48 PM fibres in a phased array configuration is reported. The resulting Strehl ratio degrades by
READ LESS

Summary

Coherent combining of a record 48 PM fibres in a phased array configuration is reported. The resulting Strehl ratio degrades by

READ MORE

Using filter banks to improve interceptor performance against weaving targets

Author:
Published in:
AIAA Guidance, Navigation, and Control Conf., 21-24 August 2006.

Summary

It is well known that interceptor performance against a weaving or spiraling target can be improved by use of a special purpose weave guidance law. However the weave guidance law requires knowledge of the target weave frequency. When the target weave frequency is unknown an extended Kalman filter is usually considered for the problem because it can be used to estimate the target weave frequency. However, the performance of the extended Kalman filter is sensitive to initialization errors. This paper offers an unusual linear Kalman filter bank approach, where each filter is tuned to a different target weave frequency, as a potential solution for estimating the target weave frequency. Rather than combining individual filter outputs in some probabilistic sense, a straightforward algorithm is presented for choosing the filter that is most closely tuned to the actual target weave frequency. This paper demonstrates that this filter bank approach is superior to that of the extended Kalman filter for the weaving target problem.
READ LESS

Summary

It is well known that interceptor performance against a weaving or spiraling target can be improved by use of a special purpose weave guidance law. However the weave guidance law requires knowledge of the target weave frequency. When the target weave frequency is unknown an extended Kalman filter is usually...

READ MORE

An end-to-end demonstration of a receiver array based free-space photon counting communications link

Published in:
SPIE Vol. 6304, Free-Space Laser Communications VI, 13-17 August 2006, pp. 63040H-1 - 63040H-13.

Summary

NASA anticipates a significant demand for long-haul communications service from deep-space to Earth in the near future. To address this need, a substantial effort has been invested in developing a free-space laser communications system that can be operated at data rates that are 10-1000 times higher than current RF systems. We have built an endto- end free-space photon counting testbed to demonstrate many of the key technologies required for a deep space optical receiver. The testbed consists of two independent receivers, each using a Geiger-mode avalanche photodiode detector array. A hardware aggregator combines the photon arrivals from the two receivers and the aggregated photon stream is decoded in real time with a hardware turbo decoder. We have demonstrated signal acquisition, clock synchronization, and error free communications at data rates up to 14 million bits per second while operating within 1 dB of the channel capacity with an efficiency of greater than 1 bit per incident photon.
READ LESS

Summary

NASA anticipates a significant demand for long-haul communications service from deep-space to Earth in the near future. To address this need, a substantial effort has been invested in developing a free-space laser communications system that can be operated at data rates that are 10-1000 times higher than current RF systems...

READ MORE

Toward an interagency language roundtable based assessment of speech-to-speech translation capabilitites

Published in:
AMTA 2006, 7th Biennial Conf. of the Association for Machine Translation in the Americas, 8-12 August 2006.

Summary

We present observations from three exercises designed to map the effective listening and speaking skills of an operator of a speech-to-speech translation system (S2S) to the Interagency Language Roundtable (ILR) scale. Such a mapping is nontrivial, but will be useful for government and military decision makers in managing expectations of S2S technology. We observed domain-dependent S2S capabilities in the ILR range of Level 0+ to Level 1, and interactive text-based machine translation in the Level 3 range.
READ LESS

Summary

We present observations from three exercises designed to map the effective listening and speaking skills of an operator of a speech-to-speech translation system (S2S) to the Interagency Language Roundtable (ILR) scale. Such a mapping is nontrivial, but will be useful for government and military decision makers in managing expectations of...

READ MORE