Publications

Refine Results

(Filters Applied) Clear All

Short-time signal representation by nonlinear difference equations

Published in:
Proc. IEEE Int. Conf. on Acoustics, Speech and Signal Processing, ICASSP, Vol. 3, Digital Signal Processing, 3-6 April 1990, pp. 1551-1554.

Summary

The solution of a nonlinear difference equation can take on complicated deterministic behavior which appears to be random for certain values of the equation's coefficients. Due to the sensitivities to initial conditions of the output of such "chaotic" systems, it is difficult to duplicate the waveform structure by parameter analysis and waveform synthesis techniques. In this paper, methods are investigated for short-time analysis and synthesis of signals from a class of second-order difference equations with a cubic nonlinearity. In analysis, two methods are explored for estimating equation coefficients: (1) prediction error minimization (a linear estimation problem) and (2) waveform error minimization (a nonlinear estimation problem). In the latter case, which improves on the prediction error solution, an iterative analysis-by-synthesis method is derived which allows as free variables initial conditions, as well as equation coefficients. Parameter estimates from these techniques are used in sequential short-time synthesis procedures. Possible application to modeling "quasi-periodic" behavior in speech waveforms is discussed.
READ LESS

Summary

The solution of a nonlinear difference equation can take on complicated deterministic behavior which appears to be random for certain values of the equation's coefficients. Due to the sensitivities to initial conditions of the output of such "chaotic" systems, it is difficult to duplicate the waveform structure by parameter analysis...

READ MORE

Noise reduction using a soft-decision sine-wave vector quantizer

Published in:
Proc. IEEE Int. Conf. on Acoustics, Speech and Signal Processing, ICASSP, Vol. 2, Speech Processing 2; VLSI, Audio and Electroacoustics, 3-6 April 1990, pp. 821-824.

Summary

The need for noise reduction arises in speech communication channels, such as ground-to-air transmission and ground-based cellular radio, to improve vocoder quality and speech recognition accuracy. In this paper, noise reduction is performed in the context of a high-quality harmonic serc-phase sine-wave analysis/synthesis system which is characterized by sine-wave amplitudes, a voicing probability, and a fundamental frequency. Least-squared error estimation of a harmonic sine-wave representation leads to a "soft decision" template estimate consisting of sine-wave amplitudes and a voicing probability. The least-squares solution is modified to use template-matching with "nearest neighbors." The reconstruction is improved by using the modified least-squares solution only in spectral regions with low signal-to-noise ratio. The results, although preliminary, provide evidence that harmonic zero-phase sine-wave analysis/synthesis, combined with effective estimation of sine-wave amplitudes and probability of voicing, offers a promising approach to noise reduction.
READ LESS

Summary

The need for noise reduction arises in speech communication channels, such as ground-to-air transmission and ground-based cellular radio, to improve vocoder quality and speech recognition accuracy. In this paper, noise reduction is performed in the context of a high-quality harmonic serc-phase sine-wave analysis/synthesis system which is characterized by sine-wave amplitudes...

READ MORE

Automatic talker activity labeling for co-channel talker interference suppression

Published in:
Proc. IEEE Int. Conf. on Acoustics, Speech and Signal Processing, Vol. 2, Speech Processing 2; VLSI; Audio and Electroacoustics, ICASSP, 3-6 April 1990, pp. 813-816.

Summary

This paper describes a speaker activity detector taking co-channel speech as input and labeling intervals of the input as target-only, jammer-only, or two-speaker (target+jammer). The algorithms applied were borrowed primarily from speaker recognition, thereby allowing us to use speaker-dependent test-utterance-independent information in a front-end for co-channel talker interference suppression. Parameters studied included classifier choice (vector quantization vs. Gaussian), training method (unsupervised vs. supervised), test utterance segmentation (uniform vs. adaptive), and training and testing target-to-jammer ratios. Using analysis interval lengths of 100 ms, performance reached 80% correct detection.
READ LESS

Summary

This paper describes a speaker activity detector taking co-channel speech as input and labeling intervals of the input as target-only, jammer-only, or two-speaker (target+jammer). The algorithms applied were borrowed primarily from speaker recognition, thereby allowing us to use speaker-dependent test-utterance-independent information in a front-end for co-channel talker interference suppression. Parameters...

READ MORE

Robust speech recognition using hidden Markov models: overview of a research program

Summary

This report presents an overview of a program of speech recognition research which was initiated in 1985 with the major goal of developing techniques for robust high performance speech recognition under the stress and noise conditions typical of a military aircraft cockpit. The work on recognition in stress and noise during 1985 and 1986 produced a robust Hidden Markov Model (HMM) isolated-word recognition (IWR) system with 99 percent speaker-dependent accuracy for several difficult stress/noise data bases, and very high performance for normal speech. Robustness techniques which were developed and applied include multi-style training, robust estimation of parameter variances, perceptually-motivated stress-tolerant distance measures, use of time-differential speech parameters, and discriminant analysis. These techniques and others produced more than an order-of-magnitude reduction in isolated-word recognition error rate relative to a baseline HMM system. An important feature of the Lincoln HMM system has been the use of continuous-observation HMM techniques, which provide a good basis for the development of the robustness techniques, and avoid the need for a vector quantizer at the input to the HMM system. Beginning in 1987, the robust HMM system has been extended to continuous speech recognition for both speaker-dependent and speaker-independent tasks. The robust HMM continuous speech recognizer was integrated in real-time with a stressing simulated flight task, which was judged to be very realistic by a number of military pilots. Phrase recognition accuracy on the limited-task-domain (28-word vocabulary) flight task is better than 99.9 percent. Recently, the robust HMM system has been extended to large-vocabulary continuous speech recognition, and has yielded excellent performance in both speaker-dependent and speaker-independent recognition on the DARPA 1000-word vocabulary resource management data base. Current efforts include further improvements to the HMM system, techniques for the integration of speech recognition with natural language processing, and research on integration of neural network techniques with HMM.
READ LESS

Summary

This report presents an overview of a program of speech recognition research which was initiated in 1985 with the major goal of developing techniques for robust high performance speech recognition under the stress and noise conditions typical of a military aircraft cockpit. The work on recognition in stress and noise...

READ MORE

An approach to co-channel talker interference suppression using a sinusoidal model for speech

Published in:
IEEE Trans. Acoust. Speech Signal Process., Vol. 38, No. 1, January 1990, pp. 56-59.

Summary

This paper describes a new approach to co-channel talker interference suppression on a sinusoidal representation of speech. The technique fits a sinusoidal model to additive vocalic speech segments such that the least mean-squared error between the model and the summed waveforms is obtained. Enhancement is achieved by synthesizing a waveform from the sine waves attributed to the desired speaker. Least-squares estimation is applied to obtain sine-wave amplitudes and phases of both talkers, based on either a priori sine-wave frequencies or a priori fundamental frequency contours. When the frequencies of the two waveforms are closely spaced, the performance is significantly improved by exploiting the time evolution of the sinusoidal parameters across multiple analysis frames. The least-squared error approach is also extended, under restricted conditions, to estimate fundamental frequency contours of both speakers from the summed waveforms. The results obtained, although limited in their scope, provide evidence that the sinusoidal analysis/synthesis model with effective parameter estimation techniques offers a promising approach to the problem of co-channel talker interference suppression over a range of conditions.
READ LESS

Summary

This paper describes a new approach to co-channel talker interference suppression on a sinusoidal representation of speech. The technique fits a sinusoidal model to additive vocalic speech segments such that the least mean-squared error between the model and the summed waveforms is obtained. Enhancement is achieved by synthesizing a waveform...

READ MORE

Spoken language systems

Summary

Spoken language is the most natural and common form of human-human communication, whether face to face, over the telephone, or through various communication media such as radio and television. In contrast, human-machine interaction is currently achieved largely through keyboard strokes, pointing, or other mechanical means, using highly stylized languages. Communication, whether human-human or human-machine, suffers greatly when the two communicating agents do not "speak" the same language. The ultimate goal of work on spoken language systems is to overcome this language barrier by building systems that provide the necessary interpretive function between various languages, thus establishing spoken language as a versatile and natural communication medium between humans and machines and among humans speaking different languages.
READ LESS

Summary

Spoken language is the most natural and common form of human-human communication, whether face to face, over the telephone, or through various communication media such as radio and television. In contrast, human-machine interaction is currently achieved largely through keyboard strokes, pointing, or other mechanical means, using highly stylized languages. Communication...

READ MORE

Far-echo cancellation in the presence of frequency offset (full duplex modem)

Published in:
IEEE Trans. Commun., Vol. 37, No. 6, June 1989, pp. 635-644.

Summary

In this paper, we present a design for a full-duplex echo-cancelling data modem based on a combined adaptive reference algorithm and adaptive channel equalizer. The adaptive reference algorithm has the advantage that interference to the echo canceller caused by the far-end signal can be eliminated by subtracting an estimate of the far-end signal based on receiver decisions. This technique provides a new approach for full-duplex far-echo cancellation in which the far echo can be cancelled in spite of carrier frequency offset. To estimate the frequency offset, the system uses a separate receiver structure for the far echo which provides equalization of the far-echo channel and tracks the frequency offset in the far echo. The feasibility of the echo-cancelling algorithms is demonstrated by computer simulation with realistic channel distortions and with 4800 bits/s data transmission at which rate frequency offset in the far echo becomes important.
READ LESS

Summary

In this paper, we present a design for a full-duplex echo-cancelling data modem based on a combined adaptive reference algorithm and adaptive channel equalizer. The adaptive reference algorithm has the advantage that interference to the echo canceller caused by the far-end signal can be eliminated by subtracting an estimate of...

READ MORE

Phase coherence in speech reconstruction for enhancement and coding applications

Published in:
Proc. IEEE Int. Conf. on Acoustics, Speech and Signal Processing, ICASSP, Vol. 1, Speech Processing 1, 23-26 May 1989, pp. 207-209.

Summary

It has been shown that an analysis-synthesis system based on a sinusoidal representation leads to synthetic speech that is essentially perceptually indistinguishable from the original. A change in speech quality has been observed, however, when the phase relation of the sine waves is altered. This occurs in practice when sine waves are processed for speech enhancement (e.g., time-scale modification and reducing peak-to-RMS ratio) and for speech coding. This paper describes a zero-phase sinusoidal analysis-synthesis system which generates natural-sounding speech without the requirement of vocal tract phase. The method provides a basis for improving sound quality by providing different levels of phase coherence in speech reconstruction for time-scale modification, for a baseline system for coding, and for reducing the peak-to-RMS ration by dispersion.
READ LESS

Summary

It has been shown that an analysis-synthesis system based on a sinusoidal representation leads to synthetic speech that is essentially perceptually indistinguishable from the original. A change in speech quality has been observed, however, when the phase relation of the sine waves is altered. This occurs in practice when sine...

READ MORE

Speech-state-adaptive simulation of co-channel talker interference suppression

Published in:
Proc. IEEE Int. Conf. on Acoustics, Speech and Signal Processing, ICASSP, 23-26 May 1989, pp. 361-364.

Summary

A co-channel talker interference suppression system processes an input waveform containing the sum of two simultaneous speech signals, referred to as the target and the jammer, to produce a waveform estimate of the target speech signal alone. This paper describes the evaluation of a simulated suppression system performing ideal suppression of a jammer signal given the voicing states (voiced, unvoiced, silent) of the target and jammer speech as a function of time and given the isolated target and jammer speech waveforms. By applying suppression to select regions of jammer speech as a function of the voicing states of the target and jammer, and by measuring the intelligibility of the resulting jammer suppressed co-channel speech, it is possible to identify those regions of co-channel speech on which interference suppression most improves intelligibility. Such results can help focus algorithm development efforts.
READ LESS

Summary

A co-channel talker interference suppression system processes an input waveform containing the sum of two simultaneous speech signals, referred to as the target and the jammer, to produce a waveform estimate of the target speech signal alone. This paper describes the evaluation of a simulated suppression system performing ideal suppression...

READ MORE

Review of neural networks for speech recognition

Published in:
Neural Comput., Vol. 1, 1989, pp. 1-38.

Summary

The performance of current speech recognition systems is far below that of humans. Neural nets offer the potential of providing massive parallelism, adaptation, and new algorithmic approaches to problems in speech recognition. Initial studies have demonstrated that multi-layer networks with time delays can provide excellent discrimination between small sets of pre-segmented difficult-to-discriminate words, consonants, and vowels. Performance for these small vocabularies has often exceeded that of more conventional approaches. Physiological front ends have provided improved recognition accuracy in noise and a cochlea filter-bank that could be used in these front ends has been implemented using micro-power analog VLSI techniques. Techniques have been developed to scale networks up in size to handle larger vocabularies, to reduce training time, and to train nets with recurrent connections. Multilayer perceptron classifiers are being integrated into conventional continuous-speech recognizers. Neural net architectures have been developed to perform the computations required by vector quantizers, static pattern classifiers, and the Viterbi decoding algorithm. Further work is necessary for large-vocabulary continuous-speech problems, to develop training algorithms that progressively build internal word models, and to develop compact VLSI neural net hardware.
READ LESS

Summary

The performance of current speech recognition systems is far below that of humans. Neural nets offer the potential of providing massive parallelism, adaptation, and new algorithmic approaches to problems in speech recognition. Initial studies have demonstrated that multi-layer networks with time delays can provide excellent discrimination between small sets of...

READ MORE