Publications

Refine Results

(Filters Applied) Clear All

An integrated speech-background model for robust speaker identification

Published in:
Proc. IEEE Int. Conf. on Acoustics, Speech and Signal Processing, ICASSP, Vol. 2, 23-26 March 1992, pp. 185-188.

Summary

This paper examines a procedure for text independent speaker identification in noisy environments where the interfering background signals cannot be characterized using traditional broadband or impulsive noise models. In the procedure, both the speaker and the background processes are modeled using mixtures of Gaussians. Speaker and background models are integrated into a unified statistical framework allowing the decoupling of the underlying speech process from the noise corrupted observations via the expectation-maximization algorithm. Using this formalism, speaker model parameters are estimated in the presence of the background process, and a scoring procedure is implemented for computing the speaker likelihood in the noise corrupted environment. Performance is evaluated using a 16 speaker conversational speech database with both "speech babble" and white noise background processes.
READ LESS

Summary

This paper examines a procedure for text independent speaker identification in noisy environments where the interfering background signals cannot be characterized using traditional broadband or impulsive noise models. In the procedure, both the speaker and the background processes are modeled using mixtures of Gaussians. Speaker and background models are integrated...

READ MORE

Shape invariant time-scale and pitch modification of speech

Published in:
IEEE Trans. Signal Process., Vol. 40, No. 3, March 1992, pp. 497-510.

Summary

The simplified linear model of speech production predicts that when the rate of articulation is changed, the resulting waveform takes on the appearance of the original, except for a change in the time scale. The goal of this paper is to develop a time-scale modification system that preserves this shape-invariance property during voicing. This is done using a version of the sinusoidal analysis-synthesis system that models and independently modifies the phase contributions of the vocal tract and vocal cord excitation. An important property of the system is its capability of performing time-varying rates of change. Extensions of the method are applied to fixed and time-varying pitch modification of speech. The sine-wave analysis-synthesis system also allows for shape-invariant joint time-scale and pitch modification, and allows for the adjustment of the time scale and pitch according to speech characteristics such as the degree of voicing.
READ LESS

Summary

The simplified linear model of speech production predicts that when the rate of articulation is changed, the resulting waveform takes on the appearance of the original, except for a change in the time scale. The goal of this paper is to develop a time-scale modification system that preserves this shape-invariance...

READ MORE

Opportunities for advanced speech processing in military computer-based systems

Published in:
Proc. IEEE, Vol. 79, No. 11, November 1991, pp. 1626-1641.

Summary

This paper presents a study of military applications of advanced speech processing technology which includes three major elements: 1) review and assessment of current efforts in military applications of speech technology; 2) identification of opportunities for future military applications of advanced speech technology; and 3) identification of problem areas where research in speech processing is needed to meet application requirements, and of current research thrusts which appear promising. The relationship of this study to previous assessments of military applications of speech technology is discussed and substantial recent progress is noted. Current efforts in military applications of speech technology which are highlighted include: 1) narrow-band (2400 his) and very low-rate (50-1200 his) secure voice communication; 2) voice/data integration in computer networks; 3) speech recognition in fighter aircraft, military helicopters, battle management, and air traffic control training systems; and 4) noise and interference removal for human listeners. Opportunities for advanced applications are identified by means of descriptions of several generic systems which would be possible with advances in speech technology and in system integration. These generic systems include 1) an integrated multirate voice data communications terminal; 2) an interactive speech enhancement system; 3) a voice-controlled pilot's associate system; 4) advanced air traffic control training systems; 5) a battle management command and control support system with spoken natural language interface; and 6) a spoken language translation system. In identifying problem areas and research efforts to meet application requirements, it is observed that some of the most promising research involves the integration of speech algorithm techniques including speech coding, speech recognition, and speaker recognition.
READ LESS

Summary

This paper presents a study of military applications of advanced speech processing technology which includes three major elements: 1) review and assessment of current efforts in military applications of speech technology; 2) identification of opportunities for future military applications of advanced speech technology; and 3) identification of problem areas where...

READ MORE

Low-rate speech coding based on the sinusoidal model

Published in:
Chapter 6 in Advances in Speech Signal Processing, Marcel Dekker, Inc., 1992, pp. 165-208.

Summary

One approach to the problem of representation of speech signals is to use the speech production model in which speech is viewed as the result of passing a glottal excitation waveform through a time-varying linear filter that models the resonant characteristics of the vocal tract. In many applications it suffices to assume that the glottal excitation can be in one of two possible states corresponding to voiced or unvoiced speech. In attempts to design high-quality speech coders at the midband rates, generalizations of the binary excitation model have been developed. One such approach is multipulse (Atal and Remde, 1982) which uses more than one pitch pulse to model voiced speech and a possibly random set of pulses to model unvoiced speech. Code excited linear prediction (CELP) (Schroeder and Atal, 1985) is another representation which models the excitation as one of a number of random sequences or "codewords" superimposed on periodic pitch pulses. In this chapter the goal is also to generalize the model for the glottal excitation; but instead of using impulses as in multipulse or random sequences as in CELP, the excitation is assumed to be composed of sinusoidal components of arbitrary amplitudes, frequencies, and phases (McAulay and Quatieri, 1986).
READ LESS

Summary

One approach to the problem of representation of speech signals is to use the speech production model in which speech is viewed as the result of passing a glottal excitation waveform through a time-varying linear filter that models the resonant characteristics of the vocal tract. In many applications it suffices...

READ MORE

Speech nonlinearities, modulations, and energy operators

Published in:
Proc. IEEE Int. Conf. on Acoustics, Speech and Signal Processing, ICASSP, 14-17 May 1991, pp. 421-424.

Summary

In this paper, we investigate an AM-FM model for representing modulations in speech resonances. Specifically, we propose a frequency modulation (FM) model for the time-varying formants whose amplitude varies as the envelope of an amplitude-modulated (AM) signal. To detect the modulations we apply the energy operator (psi)(x) = (x)^2 - xx and its discrete counterpart. We found that psi can approximately track the envelope of AM signals, the instantaneous frequency of FM signals, and the product of these two functions in the general case of AM-FM signals. Several experiments are reported on the applications of this AM-FM modeling to speech signals, bandpass filtered via Gabor filtering.
READ LESS

Summary

In this paper, we investigate an AM-FM model for representing modulations in speech resonances. Specifically, we propose a frequency modulation (FM) model for the time-varying formants whose amplitude varies as the envelope of an amplitude-modulated (AM) signal. To detect the modulations we apply the energy operator (psi)(x) = (x)^2 -...

READ MORE

Peak-to-rms reduction of speech based on a sinusoidal model

Published in:
IEEE Trans. Signal Process., Vol. 39, No. 2, February 1991, pp. 273-288.

Summary

In a number of applications, a speech waveform is processed using phase dispersion and amplitude compression to reduce its peak-to-rms ratio so as to increase loudness and intelligibility while minimizing perceived distortion. In this paper, a sinusoidal-based analysis/synthesis system is used to apply a radar design solution to the problem of dispersing the phase of a speech waveform. Unlike conventional methods of phase dispersion, this solution technique adapts dynamically to the pitch and spectral characteristics of the speech, while maintaining the original spectral envelope. The solution can also be used to drive the sine-wave amplitude modification for amplitude compression, and is coupled to the desired shaping of the speech spectrum. The new dispersion solution, when integrated with amplitude compression, results in a significant reduction in the peak-to-rms ratio of the speech waveform with acceptable loss in quality. Application of a real-time prototype sine-wave preprocessor to AM radio broadcasting is described.
READ LESS

Summary

In a number of applications, a speech waveform is processed using phase dispersion and amplitude compression to reduce its peak-to-rms ratio so as to increase loudness and intelligibility while minimizing perceived distortion. In this paper, a sinusoidal-based analysis/synthesis system is used to apply a radar design solution to the problem...

READ MORE

Short-time signal representation by nonlinear difference equations

Published in:
Proc. IEEE Int. Conf. on Acoustics, Speech and Signal Processing, ICASSP, Vol. 3, Digital Signal Processing, 3-6 April 1990, pp. 1551-1554.

Summary

The solution of a nonlinear difference equation can take on complicated deterministic behavior which appears to be random for certain values of the equation's coefficients. Due to the sensitivities to initial conditions of the output of such "chaotic" systems, it is difficult to duplicate the waveform structure by parameter analysis and waveform synthesis techniques. In this paper, methods are investigated for short-time analysis and synthesis of signals from a class of second-order difference equations with a cubic nonlinearity. In analysis, two methods are explored for estimating equation coefficients: (1) prediction error minimization (a linear estimation problem) and (2) waveform error minimization (a nonlinear estimation problem). In the latter case, which improves on the prediction error solution, an iterative analysis-by-synthesis method is derived which allows as free variables initial conditions, as well as equation coefficients. Parameter estimates from these techniques are used in sequential short-time synthesis procedures. Possible application to modeling "quasi-periodic" behavior in speech waveforms is discussed.
READ LESS

Summary

The solution of a nonlinear difference equation can take on complicated deterministic behavior which appears to be random for certain values of the equation's coefficients. Due to the sensitivities to initial conditions of the output of such "chaotic" systems, it is difficult to duplicate the waveform structure by parameter analysis...

READ MORE

Noise reduction using a soft-decision sine-wave vector quantizer

Published in:
Proc. IEEE Int. Conf. on Acoustics, Speech and Signal Processing, ICASSP, Vol. 2, Speech Processing 2; VLSI, Audio and Electroacoustics, 3-6 April 1990, pp. 821-824.

Summary

The need for noise reduction arises in speech communication channels, such as ground-to-air transmission and ground-based cellular radio, to improve vocoder quality and speech recognition accuracy. In this paper, noise reduction is performed in the context of a high-quality harmonic serc-phase sine-wave analysis/synthesis system which is characterized by sine-wave amplitudes, a voicing probability, and a fundamental frequency. Least-squared error estimation of a harmonic sine-wave representation leads to a "soft decision" template estimate consisting of sine-wave amplitudes and a voicing probability. The least-squares solution is modified to use template-matching with "nearest neighbors." The reconstruction is improved by using the modified least-squares solution only in spectral regions with low signal-to-noise ratio. The results, although preliminary, provide evidence that harmonic zero-phase sine-wave analysis/synthesis, combined with effective estimation of sine-wave amplitudes and probability of voicing, offers a promising approach to noise reduction.
READ LESS

Summary

The need for noise reduction arises in speech communication channels, such as ground-to-air transmission and ground-based cellular radio, to improve vocoder quality and speech recognition accuracy. In this paper, noise reduction is performed in the context of a high-quality harmonic serc-phase sine-wave analysis/synthesis system which is characterized by sine-wave amplitudes...

READ MORE

Automatic talker activity labeling for co-channel talker interference suppression

Published in:
Proc. IEEE Int. Conf. on Acoustics, Speech and Signal Processing, Vol. 2, Speech Processing 2; VLSI; Audio and Electroacoustics, ICASSP, 3-6 April 1990, pp. 813-816.

Summary

This paper describes a speaker activity detector taking co-channel speech as input and labeling intervals of the input as target-only, jammer-only, or two-speaker (target+jammer). The algorithms applied were borrowed primarily from speaker recognition, thereby allowing us to use speaker-dependent test-utterance-independent information in a front-end for co-channel talker interference suppression. Parameters studied included classifier choice (vector quantization vs. Gaussian), training method (unsupervised vs. supervised), test utterance segmentation (uniform vs. adaptive), and training and testing target-to-jammer ratios. Using analysis interval lengths of 100 ms, performance reached 80% correct detection.
READ LESS

Summary

This paper describes a speaker activity detector taking co-channel speech as input and labeling intervals of the input as target-only, jammer-only, or two-speaker (target+jammer). The algorithms applied were borrowed primarily from speaker recognition, thereby allowing us to use speaker-dependent test-utterance-independent information in a front-end for co-channel talker interference suppression. Parameters...

READ MORE

Robust speech recognition using hidden Markov models: overview of a research program

Summary

This report presents an overview of a program of speech recognition research which was initiated in 1985 with the major goal of developing techniques for robust high performance speech recognition under the stress and noise conditions typical of a military aircraft cockpit. The work on recognition in stress and noise during 1985 and 1986 produced a robust Hidden Markov Model (HMM) isolated-word recognition (IWR) system with 99 percent speaker-dependent accuracy for several difficult stress/noise data bases, and very high performance for normal speech. Robustness techniques which were developed and applied include multi-style training, robust estimation of parameter variances, perceptually-motivated stress-tolerant distance measures, use of time-differential speech parameters, and discriminant analysis. These techniques and others produced more than an order-of-magnitude reduction in isolated-word recognition error rate relative to a baseline HMM system. An important feature of the Lincoln HMM system has been the use of continuous-observation HMM techniques, which provide a good basis for the development of the robustness techniques, and avoid the need for a vector quantizer at the input to the HMM system. Beginning in 1987, the robust HMM system has been extended to continuous speech recognition for both speaker-dependent and speaker-independent tasks. The robust HMM continuous speech recognizer was integrated in real-time with a stressing simulated flight task, which was judged to be very realistic by a number of military pilots. Phrase recognition accuracy on the limited-task-domain (28-word vocabulary) flight task is better than 99.9 percent. Recently, the robust HMM system has been extended to large-vocabulary continuous speech recognition, and has yielded excellent performance in both speaker-dependent and speaker-independent recognition on the DARPA 1000-word vocabulary resource management data base. Current efforts include further improvements to the HMM system, techniques for the integration of speech recognition with natural language processing, and research on integration of neural network techniques with HMM.
READ LESS

Summary

This report presents an overview of a program of speech recognition research which was initiated in 1985 with the major goal of developing techniques for robust high performance speech recognition under the stress and noise conditions typical of a military aircraft cockpit. The work on recognition in stress and noise...

READ MORE