This paper describes a new approach to co-channel talker interference suppression on a sinusoidal representation of speech. The technique fits a sinusoidal model to additive vocalic speech segments such that the least mean-squared error between the model and the summed waveforms is obtained. Enhancement is achieved by synthesizing a waveform from the sine waves attributed to the desired speaker. Least-squares estimation is applied to obtain sine-wave amplitudes and phases of both talkers, based on either a priori sine-wave frequencies or a priori fundamental frequency contours. When the frequencies of the two waveforms are closely spaced, the performance is significantly improved by exploiting the time evolution of the sinusoidal parameters across multiple analysis frames. The least-squared error approach is also extended, under restricted conditions, to estimate fundamental frequency contours of both speakers from the summed waveforms. The results obtained, although limited in their scope, provide evidence that the sinusoidal analysis/synthesis model with effective parameter estimation techniques offers a promising approach to the problem of co-channel talker interference suppression over a range of conditions.

READ LESS

Summary

An approach to co-channel talker interference suppression using a sinusoidal model for speech

Spoken language systems

January 1, 1990

Journal Article

Author:

John Makhoul

…

Published in:

Annu. Rev. Comput. Sci., Vol. 4, 1990, pp. 481-501.

Topic:

speech recognition

R&D area:

Cyber Security and Information Sciences

R&D group:

Artificial Intelligence Technology and Systems

Summary

Spoken language is the most natural and common form of human-human communication, whether face to face, over the telephone, or through various communication media such as radio and television. In contrast, human-machine interaction is currently achieved largely through keyboard strokes, pointing, or other mechanical means, using highly stylized languages. Communication, whether human-human or human-machine, suffers greatly when the two communicating agents do not "speak" the same language. The ultimate goal of work on spoken language systems is to overcome this language barrier by building systems that provide the necessary interpretive function between various languages, thus establishing spoken language as a versatile and natural communication medium between humans and machines and among humans speaking different languages.

READ LESS

Summary

Spoken language systems

Far-echo cancellation in the presence of frequency offset (full duplex modem)

June 1, 1989

Journal Article

Author:

Thomas F. Quatieri

…

Gerald C. O'Leary

Published in:

IEEE Trans. Commun., Vol. 37, No. 6, June 1989, pp. 635-644.

Topic:

speech enhancement

R&D area:

Cyber Security and Information Sciences

R&D group:

Artificial Intelligence Technology and Systems

Summary

In this paper, we present a design for a full-duplex echo-cancelling data modem based on a combined adaptive reference algorithm and adaptive channel equalizer. The adaptive reference algorithm has the advantage that interference to the echo canceller caused by the far-end signal can be eliminated by subtracting an estimate of the far-end signal based on receiver decisions. This technique provides a new approach for full-duplex far-echo cancellation in which the far echo can be cancelled in spite of carrier frequency offset. To estimate the frequency offset, the system uses a separate receiver structure for the far echo which provides equalization of the far-echo channel and tracks the frequency offset in the far echo. The feasibility of the echo-cancelling algorithms is demonstrated by computer simulation with realistic channel distortions and with 4800 bits/s data transmission at which rate frequency offset in the far echo becomes important.

READ LESS

Summary

Far-echo cancellation in the presence of frequency offset (full duplex modem)

Phase coherence in speech reconstruction for enhancement and coding applications

May 26, 1989

Conference Paper

Author:

Thomas F. Quatieri

…

Robert J. McAulay

Published in:

Proc. IEEE Int. Conf. on Acoustics, Speech and Signal Processing, ICASSP, Vol. 1, Speech Processing 1, 23-26 May 1989, pp. 207-209.

Topic:

speech enhancement

R&D area:

Cyber Security and Information Sciences

R&D group:

Artificial Intelligence Technology and Systems

Summary

It has been shown that an analysis-synthesis system based on a sinusoidal representation leads to synthetic speech that is essentially perceptually indistinguishable from the original. A change in speech quality has been observed, however, when the phase relation of the sine waves is altered. This occurs in practice when sine waves are processed for speech enhancement (e.g., time-scale modification and reducing peak-to-RMS ratio) and for speech coding. This paper describes a zero-phase sinusoidal analysis-synthesis system which generates natural-sounding speech without the requirement of vocal tract phase. The method provides a basis for improving sound quality by providing different levels of phase coherence in speech reconstruction for time-scale modification, for a baseline system for coding, and for reducing the peak-to-RMS ration by dispersion.

READ LESS

Summary

Phase coherence in speech reconstruction for enhancement and coding applications

Speech-state-adaptive simulation of co-channel talker interference suppression

May 23, 1989

Conference Paper

Author:

Marc A. Zissman

…

Published in:

Proc. IEEE Int. Conf. on Acoustics, Speech and Signal Processing, ICASSP, 23-26 May 1989, pp. 361-364.

Topic:

signal processing

R&D area:

Cyber Security and Information Sciences

R&D group:

Artificial Intelligence Technology and Systems

Summary

A co-channel talker interference suppression system processes an input waveform containing the sum of two simultaneous speech signals, referred to as the target and the jammer, to produce a waveform estimate of the target speech signal alone. This paper describes the evaluation of a simulated suppression system performing ideal suppression of a jammer signal given the voicing states (voiced, unvoiced, silent) of the target and jammer speech as a function of time and given the isolated target and jammer speech waveforms. By applying suppression to select regions of jammer speech as a function of the voicing states of the target and jammer, and by measuring the intelligibility of the resulting jammer suppressed co-channel speech, it is possible to identify those regions of co-channel speech on which interference suppression most improves intelligibility. Such results can help focus algorithm development efforts.

READ LESS

Summary

Speech-state-adaptive simulation of co-channel talker interference suppression

Review of neural networks for speech recognition

January 1, 1989

Conference Paper

Author:

Richard P. Lippmann

Published in:

Neural Comput., Vol. 1, 1989, pp. 1-38.

Topic:

speaker recognition

R&D area:

Cyber Security and Information Sciences

R&D group:

Artificial Intelligence Technology and Systems

Summary

The performance of current speech recognition systems is far below that of humans. Neural nets offer the potential of providing massive parallelism, adaptation, and new algorithmic approaches to problems in speech recognition. Initial studies have demonstrated that multi-layer networks with time delays can provide excellent discrimination between small sets of pre-segmented difficult-to-discriminate words, consonants, and vowels. Performance for these small vocabularies has often exceeded that of more conventional approaches. Physiological front ends have provided improved recognition accuracy in noise and a cochlea filter-bank that could be used in these front ends has been implemented using micro-power analog VLSI techniques. Techniques have been developed to scale networks up in size to handle larger vocabularies, to reduce training time, and to train nets with recurrent connections. Multilayer perceptron classifiers are being integrated into conventional continuous-speech recognizers. Neural net architectures have been developed to perform the computations required by vector quantizers, static pattern classifiers, and the Viterbi decoding algorithm. Further work is necessary for large-vocabulary continuous-speech problems, to develop training algorithms that progressively build internal word models, and to develop compact VLSI neural net hardware.

READ LESS

Summary

Review of neural networks for speech recognition

A block diagram compiler for a digital signal processing MIMD computer

April 9, 1987

Conference Paper

Author:

Marc A. Zissman

…

Published in:

Proc. IEEE Int. Conf. on Acoustics, Speech and Signal Processing, ICASSP, Vol. 4, 6-9 April 1987, pp. 1867-1870.

Topic:

signal processing

R&D area:

Cyber Security and Information Sciences

R&D group:

Artificial Intelligence Technology and Systems

Summary

A Block Diagram Compiler (BOC) has been designed and implemented for converting graphic block diagram descriptions of signal processing tasks into source code to be executed on a Multiple Instruction Stream - Multiple Data Stream (MIMD) array computer. The compiler takes as input a block diagram of a real-time DSP application, entered on a graphics CAE workstation, and translates it into efficient real-time assembly language code for the target multiprocessor array. The current implementation produces code for a rectangular grid of Texas Instruments TMS32010 signal processors built at Lincoln Laboratory, but the concept could be extended to other processors or other geometries in the same way that a good assembly language programmer would write it. This report begins by examining the current implementation of the BOC including relevant aspects of the target hardware. Next, we describe the task-assignment module, which uses a simulated annealing algorithm to assign the processing tasks of the DSP application to individual processors in the array. Finally, our experiences with the current version of the BOC software and hardware are reported.

READ LESS

Summary

A block diagram compiler for a digital signal processing MIMD computer

Mixed-phase deconvolution of speech based on a sine-wave model

April 9, 1987

Conference Paper

Author:

Thomas F. Quatieri

…

Robert J. McAulay

Published in:

Proc. IEEE Int. Conf. on Acoustics, Speech and Signal Processing, ICASSP, Vol. 2, 6-9 April 1987, pp. 649-652.

Topic:

signal processing

R&D area:

Cyber Security and Information Sciences

R&D group:

Artificial Intelligence Technology and Systems

Summary

This paper describes a new method of deconvolving the vocal cord excitation and vocal tract system response. The technique relies on a sine-wave representation of the speech waveform and forms the basis of an analysis-synthesis method which yields synthetic speech essentially indistinguishable from the original. Unlike an earlier sinusoidal analysis-synthesis technique that used a minimum-phase system estimate, the approach in this paper generates a "mixed-phase" system estimate and thus an improved decomposition of excitation and system components. Since a mixed-phase system estimate is removed from the speech waveform, the resulting excitation residual is less dispersed than the previous sinusoidal-based excitation estimate of the more commonly used linear prediction residual. A method of time-varying linear filtering is given as an alternative to sinusoidal reconstruction, similar to conventional time-domain synthesis used in certain vocoders, but without the requirement of pitch and voicing decisions. Finally, speech modification with a mixed-phase system estimate is shown to be capable of more closely preserving waveform shape in time-scale and pitch transformations than the earlier approach.

READ LESS

Summary

Mixed-phase deconvolution of speech based on a sine-wave model

Multi-style training for robust isolated-word speech recognition

April 9, 1987

Conference Paper

Author:

Richard P. Lippmann

…

Published in:

Proc. IEEE Int. Conf. on Acoustics, Speech and Signal Processing, ICASSP, Vol. 2, 6-9 April 1987, pp. 705-708.

Topic:

speech recognition

R&D area:

Cyber Security and Information Sciences

R&D group:

Artificial Intelligence Technology and Systems

Summary

A new training procedure called multi-style training has been developed to improve performance when a recognizer is used under stress or in high noise but cannot be trained in these conditions. Instead of speaking normally during training, talkers use different, easily produced, talking styles. This technique was tested using a speech data base that included stress speech produced during a workload task and when intense noise was presented through earphones. A continuous-distribution talker-dependent Hidden Markov Model (HMM) recognizer was trained both normally (5 normally spoken tones) and with multi-style training (one token each from normal, fast, clear, loud, and question-pitch talking styles). The average error rate under stress and normal conditions fell by more than a factor of two with multi-style training and the average error rate under conditions sampled during training fell by a factor of four.

READ LESS

Summary

Multi-style training for robust isolated-word speech recognition

Two-stage discriminant analysis for improved isolated-word recognition

April 9, 1987

Conference Paper

Author:

Edward A. Martin

…

Published in:

Proc. IEEE Int. Conf. on Acoustics, Speech and Signal Processing, ICASSP, Vol. 2, 6-9 April 1987, pp. 709-712.

Topic:

speech recognition

R&D area:

Cyber Security and Information Sciences

R&D group:

Artificial Intelligence Technology and Systems

Summary

This paper describes a two-stage isolated word search recognition system that uses a Hidden Markov Model (HMM) recognizer in the first stage and a discriminant analysis system in the second stage. During recognition, when the first-stage recognizer is unable to clearly differentiate between acoustically similar words such as "go" and "no" the second-stage discriminator is used. The second-stage system focuses on those parts of the unknown token which are most effective at discriminating the confused words. The system was tested on a 35 word, 10,710 token stress speech isolated word data base created at Lincoln Laboratory. Adding the second-stage discriminating system produced the best results to date on this data base, reducing the overall error rate by more than a factor of two.

READ LESS

Summary

Two-stage discriminant analysis for improved isolated-word recognition

Publications

Refine Results

Summary

Summary

Summary

Summary

Summary

Summary

Summary

Summary

Summary

Summary

Summary

Summary

Summary

Summary

Summary

Summary

Summary

Summary

Summary

Summary

Showing Results