Publications

Refine Results

(Filters Applied) Clear All

Demonstrations and applications of spoken language technology: highlights and perspectives from the 1993 ARPA Spoken Language Technology and Applications Day

Published in:
Proc. IEEE Int. Conf. on Acoustics, Speech and Signal Processing, ICASSP, Vol. 1, Speech Processing, 19-22 April 1994, pp. 337-340.

Summary

The ARPA Spoken Language Technology and Applications Day (SLTA'93) was a special workshop which presented a set of live, state-of-the-art demonstrations of speech recognition and Spoken Language Understanding systems. The purpose of this paper is to provide perspective on current opportunities for applications which they can enable, and reviewing the applications opportunities and needs cited by panelists and other members of the user community.
READ LESS

Summary

The ARPA Spoken Language Technology and Applications Day (SLTA'93) was a special workshop which presented a set of live, state-of-the-art demonstrations of speech recognition and Spoken Language Understanding systems. The purpose of this paper is to provide perspective on current opportunities for applications which they can enable, and reviewing the...

READ MORE

Integrated models of signal and background with application to speaker identification in noise

Published in:
IEEE Trans. Speech Audio Process., Vol. 2, No. 2, April 1994, pp. 245-257.

Summary

This paper is concerned with the problem of robust parametric model estimation and classification in noisy acoustic environments. Characterization and modeling of the external noise sources in these environments is in itself an important issue in noise compensation. The techniques described here provide a mechanism for integrating parametric models of acoustic background with the signal model so that noise compensation is tightly coupled with signal model training and classification. Prior information about the acoustic background process is provided using a maximum likelihood parameter estimation procedure that integrates an a priori model of acoustic background with the signal model. An experimental study is presented in the paper on the application of this approach to text-independent speaker identification in noisy acoustic environments. Considerable improvement in speaker classification performance was obtained for classifying unlabeled sections of conversational speech utterances from a 16-speaker population under cross-environment training and testing conditions.
READ LESS

Summary

This paper is concerned with the problem of robust parametric model estimation and classification in noisy acoustic environments. Characterization and modeling of the external noise sources in these environments is in itself an important issue in noise compensation. The techniques described here provide a mechanism for integrating parametric models of...

READ MORE

Digital signal processing applications in cochlear-implant research

Published in:
Lincoln Laboratory Journal, Vol. 7, No. 1, Spring 1994, pp. 31-62.

Summary

We have developed a facility that enables scientists to investigate a wide range of sound-processing schemes for human subjects with cochlear implants. This digital signal processing (DSP) facility-named the Programmable Interactive System for Cochlear Implant Electrode Stimulation (PISCES)-was designed, built, and tested at Lincoln Laboratory and then installed at the Cochlear Implant Research Laboratory (CIRL) of the Massachusetts Eye and Ear Infirmary (MEEI). New stimulator algorithms that we designed and ran on PISCES have resulted in speech-reception improvements for implant subjects relative to commercial implant stimulators. These improvements were obtained as a result of interactive algorithm adjustment in the clinic, thus demonstrating the importance of a flexible signal-processing facility. Research has continued in the development of a laboratory-based, sohare-controlled, real-time, speech processing system; the exploration of new sound-processing algorithms for improved electrode stimulation; and the design of wearable stimulators that will allow subjects full-time use of stimulator algorithms developed and tested in a laboratory setting.
READ LESS

Summary

We have developed a facility that enables scientists to investigate a wide range of sound-processing schemes for human subjects with cochlear implants. This digital signal processing (DSP) facility-named the Programmable Interactive System for Cochlear Implant Electrode Stimulation (PISCES)-was designed, built, and tested at Lincoln Laboratory and then installed at the...

READ MORE

Figure of merit training for detection and spotting

Published in:
Proc. Neural Information Processing Systems, NIPS, 29 November - 2 December 1993.

Summary

Spotting tasks require detection of target patterns from a background of richly varied non-target inputs. The performance measure of interest for these tasks, called the figure of merit (FOM), is the detection rate for target patterns when the false alarm rate is in an acceptable range. A new approach to training spotters is presented which computes the FOM gradient for each input pattern and then directly maximizes the FOM using back propagation. This eliminates the need for thresholds during training. It also uses network resources to model Bayesian a posteriori probability functions accurately only for patterns which have a significant effect on the detection accuracy over the false alarm rate of interest. FOM training increased detection accuracy by 5 percentage points for a hybrid radial basis function (RBF) - hidden Markov model (HMM) wordspotter on the credit-card speech corpus.
READ LESS

Summary

Spotting tasks require detection of target patterns from a background of richly varied non-target inputs. The performance measure of interest for these tasks, called the figure of merit (FOM), is the detection rate for target patterns when the false alarm rate is in an acceptable range. A new approach to...

READ MORE

Energy separation in signal modulations with application to speech analysis

Published in:
IEEE Trans. Signal Process., Vol. 41, No. 10, October 1993, pp. 3024-3051.

Summary

Oscillatory signals that have both an amplitude-modulation (AM) and a frequency-modulation (FM) structure are encountered in almost all communication systems. We have also used these structures recently for modeling speech resonances, being motivated by previous work on investigating fluid dynamics phenomena during speech production that provide evidence for the existence of modulations in speech signals. In this paper, we use a nonlinear differential operator that can detect modulations in AM-FM signals by estimating the product of their time-varying amplitude and frequency. This operator essentially tracks the energy needed by a source to produce the oscillatory signal. To solve the fundamental problem of estimating both the amplitude envelope and instantaneous frequency of an AM-FM signal we develop a novel approach that uses nonlinear combinations of instantaneous signal outputs from the energy operator to separate its output energy product into its amplitude modulation and frequency modulation components. The theoretical analysis is done first for continuous-time signals. Then several efficient algorithms are developed and compared for estimating the amplitude envelope and instantaneous frequency of discrete-time AM-FM signals. These energy separation algorithms are then applied to search for modulations in speech resonances, which we model using AM-FM signals to account for time-varying amplitude envelopes and instantaneous frequencies. Our experimental results provide evidence that bandpass filtered speech signals around speech formants contain amplitude and frequency modulations within a pitch period. Overall, the energy separation algorithms, due to their very low computational complexity and instantaneously-adapting nature, are very useful in detecting modulation patterns in speech and other time-varying signals.
READ LESS

Summary

Oscillatory signals that have both an amplitude-modulation (AM) and a frequency-modulation (FM) structure are encountered in almost all communication systems. We have also used these structures recently for modeling speech resonances, being motivated by previous work on investigating fluid dynamics phenomena during speech production that provide evidence for the existence...

READ MORE

Automatic language identification using Gaussian mixture and hidden Markov models

Author:
Published in:
Proc. IEEE Int. Conf. on Acoustics, Speech and Signal Processing, Vol. 2, Speech Processing, ICASSP, 27-30 April 1993, pp. 399-402.

Summary

Ergodic, continuous-observation, hidden Markov models (HMMs) were used to perform automatic language classification and detection of speech messages. State observation probability densities were modeled as tied Gaussian mixtures. The algorithm was evaluated on four multilanguage speech databases: a three language subset of the Spoken Language Library, a three language subset of a five language Rome Laboratory database, the 20 language CCITT database, and the ten language OGI telephone speech database. Generally, performance of a single state HMM (i.e. a static Gaussian mixture classifier) was comparable to the multistate HMMs, indicating that the sequential modeling capabilities of HMMs were not exploited.
READ LESS

Summary

Ergodic, continuous-observation, hidden Markov models (HMMs) were used to perform automatic language classification and detection of speech messages. State observation probability densities were modeled as tied Gaussian mixtures. The algorithm was evaluated on four multilanguage speech databases: a three language subset of the Spoken Language Library, a three language subset...

READ MORE

Detection of transient signals using the energy operator

Published in:
Proc. IEEE Int. Conf. on Acoustics, Speech and Signal Processing, Vol. 3, ICASSP, 27-30 April 1993, pp. 145-148.

Summary

A function of the Teager-Kaiser energy operator is introduced as a method for detecting transient signals in the presence of amplitude-modulated and frequency-modulated tonal interference. This function has excellent time resolution and is robust in the presence of white noise. The output of the detection function is also independent of the interference-to-transient ratio when that ratio is large. It is demonstrated that the detection function can be applied to interference signals with multiple amplitude-modulated and frequency-modulated tonal components.
READ LESS

Summary

A function of the Teager-Kaiser energy operator is introduced as a method for detecting transient signals in the presence of amplitude-modulated and frequency-modulated tonal interference. This function has excellent time resolution and is robust in the presence of white noise. The output of the detection function is also independent of...

READ MORE

Time-scale modification of complex acoustic signals

Published in:
Proc. IEEE Int. Conf. on Acoustics, Speech and Signal Processing, ICASSP, Vol. 1, Plenary, Special, Audio, Underwater Acoustics, VLSI, Neural Networks, 27-30 April 1993, pp. 213-216.

Summary

A new approach is introduced for time-scale modification of short-duration complex acoustic signals to improve their audibility. The technique constrains the modified signal to take on a specified spectral characteristic while imposing a time-scaled version of the original temporal envelope. Both full-band and sub-band representations of the temporal envelope are considered. In the full-band case, the modified signal is obtained by appropriate selection of its Fourier transform phase. In the sub-band case, using locations of maxima in the sub-band temporal envelopes, the phase of each bandpass signal is formed to preserve "events" in the envelope of the composite signal. The approach is applied to synthetic and actual short-duration acoustic signals consisting of closely-spaced and overlapping sequential time components.
READ LESS

Summary

A new approach is introduced for time-scale modification of short-duration complex acoustic signals to improve their audibility. The technique constrains the modified signal to take on a specified spectral characteristic while imposing a time-scaled version of the original temporal envelope. Both full-band and sub-band representations of the temporal envelope are...

READ MORE

Time-scale modification with temporal envelope invariance

Published in:
Proc. IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 17-20 October 1993, pp. 127-130.

Summary

A new approach is introduced for time-scale modification of short-duration complex acoustic signals to improve their audibility. The method preserves the time-scaled temporal envelope of a signal and for enhancement capitalizes on the perceptual importance of a signal's temporal structure. The basis for the approach is a sub-band representation whose channel phases are controlled to shape the temporal envelope of the time-scaled signal. The phase control is derived from locations of events which occur within filterbank outputs. A frame-based generalization of the method imposes phase consistency across consecutive synthesis frames. The approach is applied to synthetic and actual short-duration acoustic signals consisting of closely-spaced and overlapping sequential time components.
READ LESS

Summary

A new approach is introduced for time-scale modification of short-duration complex acoustic signals to improve their audibility. The method preserves the time-scaled temporal envelope of a signal and for enhancement capitalizes on the perceptual importance of a signal's temporal structure. The basis for the approach is a sub-band representation whose...

READ MORE

Two-talker pitch tracking for co-channel talker interference suppression

Published in:
MIT Lincoln Laboratory Report TR-951

Summary

Almost all co-channel talker interference suppression systems use the difference in the pitches of the target and jammer speakers to suppress the jammer and enhance the target. While joint pitch estimators outputting two pitch estimates as a function of time have been proposed, the task of proper assignment of pitch to speaker (two-talker pitch tracking) has proven difficult. This report describes several approaches to the two-talker pitch tracking problem including algorithms for pitch track interpolation, spectral envelope tracking, and spectral envelope classification. When evaluated on an all-voiced two-talker database, the best of these new tracking systems correctly assigned pitch 87% of the time given perfect joint pitch estimation.
READ LESS

Summary

Almost all co-channel talker interference suppression systems use the difference in the pitches of the target and jammer speakers to suppress the jammer and enhance the target. While joint pitch estimators outputting two pitch estimates as a function of time have been proposed, the task of proper assignment of pitch...

READ MORE