Publications

Refine Results

(Filters Applied) Clear All

A speech recognizer using radial basis function neural networks in an HMM framework

Published in:
ICASSP'92, Proc. IEEE Int. Conf. on Acoustics, Speech and Signal Processing, Vol. 1, Speech Processing 1, 23-26 March 1992, pp. 629-632.

Summary

A high performance speaker-independent isolated-word speech recognizer was developed which combines hidden Markov models (HMMs) and radial basis function (RBF) neural networks. RBF networks in this recognizer use discriminant training techniques to estimate Bayesian probabilities for each speech frame while HMM decoders estimate overall word likelihood scores for network outputs. RBF training is performed after the HMM recognizer has automatically segmented training tokens using forced Viterbi alignment. In recognition experiments using a speaker-independent E-set database, the hybrid recognizer had an error rate of 11.5% compared to 15.7% for the robust unimodal Gaussian HMM recognizer upon which the hybrid system was based. The error rate was also lower than that of a tied-mixture HMM recognizer with the same number of centers. These results demonstrate that RBF networks can be successfully incorporated in hybrid recognizers and suggest that they may be capable of good performance with fewer parameters than required by Gaussian mixture classifiers.
READ LESS

Summary

A high performance speaker-independent isolated-word speech recognizer was developed which combines hidden Markov models (HMMs) and radial basis function (RBF) neural networks. RBF networks in this recognizer use discriminant training techniques to estimate Bayesian probabilities for each speech frame while HMM decoders estimate overall word likelihood scores for network outputs...

READ MORE

Improved hidden Markov model speech recognition using radial basis function networks

Published in:
Advances in Neural Information Processing Systems, Denver, CO, 2-5 December 1991.

Summary

A high performance speaker-independent isolated-word hybrid speech recognizer was developed which combines Hidden Markov Models (HMMs) and Radial Basis Function (RBF) neural networks. In recognition experiments using a speaker-independent E-set database, the hybrid recognizer had an error rate of 11.5% compared to 15.7% for the robust unimodal Gaussian HMM recognizer upon which the hybrid system was based. These results and additional experiments demonstrate that RBF networks can be successfully incorporated in hybrid recognizers and suggest that they may be capable of good performance with fewer parameters than required by Gaussian mixture classifiers. A global parameter optimization method designed to minimize the overall word error rather than the frame recognition error failed to reduce the error rate.
READ LESS

Summary

A high performance speaker-independent isolated-word hybrid speech recognizer was developed which combines Hidden Markov Models (HMMs) and Radial Basis Function (RBF) neural networks. In recognition experiments using a speaker-independent E-set database, the hybrid recognizer had an error rate of 11.5% compared to 15.7% for the robust unimodal Gaussian HMM recognizer...

READ MORE

Opportunities for advanced speech processing in military computer-based systems

Published in:
Proc. IEEE, Vol. 79, No. 11, November 1991, pp. 1626-1641.

Summary

This paper presents a study of military applications of advanced speech processing technology which includes three major elements: 1) review and assessment of current efforts in military applications of speech technology; 2) identification of opportunities for future military applications of advanced speech technology; and 3) identification of problem areas where research in speech processing is needed to meet application requirements, and of current research thrusts which appear promising. The relationship of this study to previous assessments of military applications of speech technology is discussed and substantial recent progress is noted. Current efforts in military applications of speech technology which are highlighted include: 1) narrow-band (2400 his) and very low-rate (50-1200 his) secure voice communication; 2) voice/data integration in computer networks; 3) speech recognition in fighter aircraft, military helicopters, battle management, and air traffic control training systems; and 4) noise and interference removal for human listeners. Opportunities for advanced applications are identified by means of descriptions of several generic systems which would be possible with advances in speech technology and in system integration. These generic systems include 1) an integrated multirate voice data communications terminal; 2) an interactive speech enhancement system; 3) a voice-controlled pilot's associate system; 4) advanced air traffic control training systems; 5) a battle management command and control support system with spoken natural language interface; and 6) a spoken language translation system. In identifying problem areas and research efforts to meet application requirements, it is observed that some of the most promising research involves the integration of speech algorithm techniques including speech coding, speech recognition, and speaker recognition.
READ LESS

Summary

This paper presents a study of military applications of advanced speech processing technology which includes three major elements: 1) review and assessment of current efforts in military applications of speech technology; 2) identification of opportunities for future military applications of advanced speech technology; and 3) identification of problem areas where...

READ MORE

Robust speech recognition using hidden Markov models: overview of a research program

Summary

This report presents an overview of a program of speech recognition research which was initiated in 1985 with the major goal of developing techniques for robust high performance speech recognition under the stress and noise conditions typical of a military aircraft cockpit. The work on recognition in stress and noise during 1985 and 1986 produced a robust Hidden Markov Model (HMM) isolated-word recognition (IWR) system with 99 percent speaker-dependent accuracy for several difficult stress/noise data bases, and very high performance for normal speech. Robustness techniques which were developed and applied include multi-style training, robust estimation of parameter variances, perceptually-motivated stress-tolerant distance measures, use of time-differential speech parameters, and discriminant analysis. These techniques and others produced more than an order-of-magnitude reduction in isolated-word recognition error rate relative to a baseline HMM system. An important feature of the Lincoln HMM system has been the use of continuous-observation HMM techniques, which provide a good basis for the development of the robustness techniques, and avoid the need for a vector quantizer at the input to the HMM system. Beginning in 1987, the robust HMM system has been extended to continuous speech recognition for both speaker-dependent and speaker-independent tasks. The robust HMM continuous speech recognizer was integrated in real-time with a stressing simulated flight task, which was judged to be very realistic by a number of military pilots. Phrase recognition accuracy on the limited-task-domain (28-word vocabulary) flight task is better than 99.9 percent. Recently, the robust HMM system has been extended to large-vocabulary continuous speech recognition, and has yielded excellent performance in both speaker-dependent and speaker-independent recognition on the DARPA 1000-word vocabulary resource management data base. Current efforts include further improvements to the HMM system, techniques for the integration of speech recognition with natural language processing, and research on integration of neural network techniques with HMM.
READ LESS

Summary

This report presents an overview of a program of speech recognition research which was initiated in 1985 with the major goal of developing techniques for robust high performance speech recognition under the stress and noise conditions typical of a military aircraft cockpit. The work on recognition in stress and noise...

READ MORE

Spoken language systems

Summary

Spoken language is the most natural and common form of human-human communication, whether face to face, over the telephone, or through various communication media such as radio and television. In contrast, human-machine interaction is currently achieved largely through keyboard strokes, pointing, or other mechanical means, using highly stylized languages. Communication, whether human-human or human-machine, suffers greatly when the two communicating agents do not "speak" the same language. The ultimate goal of work on spoken language systems is to overcome this language barrier by building systems that provide the necessary interpretive function between various languages, thus establishing spoken language as a versatile and natural communication medium between humans and machines and among humans speaking different languages.
READ LESS

Summary

Spoken language is the most natural and common form of human-human communication, whether face to face, over the telephone, or through various communication media such as radio and television. In contrast, human-machine interaction is currently achieved largely through keyboard strokes, pointing, or other mechanical means, using highly stylized languages. Communication...

READ MORE

Multi-style training for robust isolated-word speech recognition

Published in:
Proc. IEEE Int. Conf. on Acoustics, Speech and Signal Processing, ICASSP, Vol. 2, 6-9 April 1987, pp. 705-708.

Summary

A new training procedure called multi-style training has been developed to improve performance when a recognizer is used under stress or in high noise but cannot be trained in these conditions. Instead of speaking normally during training, talkers use different, easily produced, talking styles. This technique was tested using a speech data base that included stress speech produced during a workload task and when intense noise was presented through earphones. A continuous-distribution talker-dependent Hidden Markov Model (HMM) recognizer was trained both normally (5 normally spoken tones) and with multi-style training (one token each from normal, fast, clear, loud, and question-pitch talking styles). The average error rate under stress and normal conditions fell by more than a factor of two with multi-style training and the average error rate under conditions sampled during training fell by a factor of four.
READ LESS

Summary

A new training procedure called multi-style training has been developed to improve performance when a recognizer is used under stress or in high noise but cannot be trained in these conditions. Instead of speaking normally during training, talkers use different, easily produced, talking styles. This technique was tested using a...

READ MORE

Two-stage discriminant analysis for improved isolated-word recognition

Published in:
Proc. IEEE Int. Conf. on Acoustics, Speech and Signal Processing, ICASSP, Vol. 2, 6-9 April 1987, pp. 709-712.

Summary

This paper describes a two-stage isolated word search recognition system that uses a Hidden Markov Model (HMM) recognizer in the first stage and a discriminant analysis system in the second stage. During recognition, when the first-stage recognizer is unable to clearly differentiate between acoustically similar words such as "go" and "no" the second-stage discriminator is used. The second-stage system focuses on those parts of the unknown token which are most effective at discriminating the confused words. The system was tested on a 35 word, 10,710 token stress speech isolated word data base created at Lincoln Laboratory. Adding the second-stage discriminating system produced the best results to date on this data base, reducing the overall error rate by more than a factor of two.
READ LESS

Summary

This paper describes a two-stage isolated word search recognition system that uses a Hidden Markov Model (HMM) recognizer in the first stage and a discriminant analysis system in the second stage. During recognition, when the first-stage recognizer is unable to clearly differentiate between acoustically similar words such as "go" and...

READ MORE

Robust HMM-based techniques for recognition of speech produced under stress and in noise

Published in:
Proc. Speech Tech '86, 28-30 April 1986, pp. 241-249.

Summary

Substantial improvements in speech recognition performance on speech produced under stress and in noise have been achieved through the development of techniques for enhancing the robustness of a base-line isolated-word Hidden Markov Model recognizer. The baseline HMM is a continuous-observation system using mel-frequency cepstra as the observation parameters. Enhancement techniques which were developed and tested include: placing a lower limit on the estimated variances of the observations; addition of temporal difference parameters; improved duration modelling; use of fixed diagonal covariance distance functions, with variances adjusted according to perceptual considerations; cepstral domain stress compensation; and multi-style training, where the system is trained on speech spoken with a variety of talking styles. With perceptually-motivated covariance and a combination of normal (single-frame) and differential cepstral observations, average error rates over five simulated-stress conditions were reduced from 20% (baseline) to 2.5% on a simulated-stress data base (105-word vocabulary, eight talkers, five conditions). With variance limiting, normal plus differential observations, and multi-style training, an error rate of 1.8% was achieved. Additional tests were conducted on a data base including nine talkers, eight talking styles, with speech produced under two levels of motor-workload stress. Substantial reductions in error rate were demonstrated for the noise and workload conditions, when multiple talking styles, rather than only normal speech, were used in training. In experiments conducted in simulated fighter cockpit noise, it was shown that error rates could be reduced significantly by training under multiple noise exposure conditions.
READ LESS

Summary

Substantial improvements in speech recognition performance on speech produced under stress and in noise have been achieved through the development of techniques for enhancing the robustness of a base-line isolated-word Hidden Markov Model recognizer. The baseline HMM is a continuous-observation system using mel-frequency cepstra as the observation parameters. Enhancement techniques...

READ MORE

A phrase recognizer using syllable-based acoustic measurements

Published in:
IEEE Trans. Acoust. Speech Signal Process., Vol. ASSP-26, No. 5, October 1978, pp. 409-418.

Summary

A system for the recognition of spoken phrases is described. The recognizer assumes that the input utterance contains one of a known set of allowable phrases, which may be spoken within a longer carrier sentence. Analysis is performed on a syllable-by-syllable basis with only the strong syllables considered in the recognition process. Each strong syllable is represented in terms of a set of distinguishing acoustic measurements taken at time points in and around the syllable nucleus. Phrases are represented as sequences of strong syllables. All parameters used in recognition are derived from LPC coefficients. Input speech is limited to 3.3 kHZ upper frequency. Recognition is completed within 1-3 s after the utterance is spoken. An interactive training facility allows flexible composition of key phrase sets. Testing was performed for a number of phrase sets each containing ten or fewer phrases, and included equal numbers of talkers used in training and talkers not used in training. Average phrase recognition accuracy was 95 percent when parameters were derived from unquantized (i.e., 16 bit) LPC coefficients and 90 percent when the LPC coefficients were transmitted to the recognizer across the ARPA network at 3500 bits/s. The recognizer has been incorporated into a user interface system where the parameters required to set up a point-to-point ARPANET voice connection can be established remotely by voice.
READ LESS

Summary

A system for the recognition of spoken phrases is described. The recognizer assumes that the input utterance contains one of a known set of allowable phrases, which may be spoken within a longer carrier sentence. Analysis is performed on a syllable-by-syllable basis with only the strong syllables considered in the...

READ MORE