Publications

Refine Results

(Filters Applied) Clear All

Sinusoidal coding

Published in:
Chapter 4 in Speech Coding and Synthesis, Elsevier Science Publishers, 1995, pp. 121-173.

Summary

This chapter summarizes the sinewave-based pitch extractor, and the high-order all-pole modelling techniques that provided the basis for the multirate Sinusoidal Transform Coder and its application to multi-speaker conferencing.
READ LESS

Summary

This chapter summarizes the sinewave-based pitch extractor, and the high-order all-pole modelling techniques that provided the basis for the multirate Sinusoidal Transform Coder and its application to multi-speaker conferencing.

READ MORE

Energy onset times for speaker identification

Published in:
IEEE Signal Process. Lett., Vol. 1, No. 11, November 1994, pp. 160-162.

Summary

Onset times of resonant energy pulses are measured with the high-resolution Teager operator and used as features in the Reynolds Gaussian-mixture speaker identification algorithm. Feature sets are constructed with primary pitch and secondary pulse locations derived from low and high speech formants. Preliminary testing was performed with a confusable 40-speaker subset from the NTIMIT (telephone channel) database. Speaker identification improved from 55 to 70% correct classification when the full set of new resonant energy-based features were added as an independent stream to conventional mel-cepstra.
READ LESS

Summary

Onset times of resonant energy pulses are measured with the high-resolution Teager operator and used as features in the Reynolds Gaussian-mixture speaker identification algorithm. Feature sets are constructed with primary pitch and secondary pulse locations derived from low and high speech formants. Preliminary testing was performed with a confusable 40-speaker...

READ MORE

Formant AM-FM for speaker identification

Published in:
Proc. IEEE-SP Int. Symp. on Time-Frequency and Time-Scale Analysis, 25-28 October 1994, pp. 608-611.

Summary

The performance of systems for speaker identification (SID) can be quite good with clean speech, though much lower with degraded speech. Thus it is useful to search for new features for SID, particularly features that are robust over a degraded channel. This paper investigates features that are robust over a degraded channel. This paper investigates features that are based on amplitude and frequency modulations of speech formants. Such modulations are measured using a high-resolution energy operator and related algorithms for recovering amplitude and frequency from an AM-FM signal. When these features are added to traditional features using an existing SID system with a telephone speech database, SID performance improved by as much as 15%. Energy onset time measurements that yielded improved SID performance are also discussed.
READ LESS

Summary

The performance of systems for speaker identification (SID) can be quite good with clean speech, though much lower with degraded speech. Thus it is useful to search for new features for SID, particularly features that are robust over a degraded channel. This paper investigates features that are robust over a...

READ MORE

Energy separation in signal modulations with application to speech analysis

Published in:
IEEE Trans. Signal Process., Vol. 41, No. 10, October 1993, pp. 3024-3051.

Summary

Oscillatory signals that have both an amplitude-modulation (AM) and a frequency-modulation (FM) structure are encountered in almost all communication systems. We have also used these structures recently for modeling speech resonances, being motivated by previous work on investigating fluid dynamics phenomena during speech production that provide evidence for the existence of modulations in speech signals. In this paper, we use a nonlinear differential operator that can detect modulations in AM-FM signals by estimating the product of their time-varying amplitude and frequency. This operator essentially tracks the energy needed by a source to produce the oscillatory signal. To solve the fundamental problem of estimating both the amplitude envelope and instantaneous frequency of an AM-FM signal we develop a novel approach that uses nonlinear combinations of instantaneous signal outputs from the energy operator to separate its output energy product into its amplitude modulation and frequency modulation components. The theoretical analysis is done first for continuous-time signals. Then several efficient algorithms are developed and compared for estimating the amplitude envelope and instantaneous frequency of discrete-time AM-FM signals. These energy separation algorithms are then applied to search for modulations in speech resonances, which we model using AM-FM signals to account for time-varying amplitude envelopes and instantaneous frequencies. Our experimental results provide evidence that bandpass filtered speech signals around speech formants contain amplitude and frequency modulations within a pitch period. Overall, the energy separation algorithms, due to their very low computational complexity and instantaneously-adapting nature, are very useful in detecting modulation patterns in speech and other time-varying signals.
READ LESS

Summary

Oscillatory signals that have both an amplitude-modulation (AM) and a frequency-modulation (FM) structure are encountered in almost all communication systems. We have also used these structures recently for modeling speech resonances, being motivated by previous work on investigating fluid dynamics phenomena during speech production that provide evidence for the existence...

READ MORE

Detection of transient signals using the energy operator

Published in:
Proc. IEEE Int. Conf. on Acoustics, Speech and Signal Processing, Vol. 3, ICASSP, 27-30 April 1993, pp. 145-148.

Summary

A function of the Teager-Kaiser energy operator is introduced as a method for detecting transient signals in the presence of amplitude-modulated and frequency-modulated tonal interference. This function has excellent time resolution and is robust in the presence of white noise. The output of the detection function is also independent of the interference-to-transient ratio when that ratio is large. It is demonstrated that the detection function can be applied to interference signals with multiple amplitude-modulated and frequency-modulated tonal components.
READ LESS

Summary

A function of the Teager-Kaiser energy operator is introduced as a method for detecting transient signals in the presence of amplitude-modulated and frequency-modulated tonal interference. This function has excellent time resolution and is robust in the presence of white noise. The output of the detection function is also independent of...

READ MORE

Time-scale modification of complex acoustic signals

Published in:
Proc. IEEE Int. Conf. on Acoustics, Speech and Signal Processing, ICASSP, Vol. 1, Plenary, Special, Audio, Underwater Acoustics, VLSI, Neural Networks, 27-30 April 1993, pp. 213-216.

Summary

A new approach is introduced for time-scale modification of short-duration complex acoustic signals to improve their audibility. The technique constrains the modified signal to take on a specified spectral characteristic while imposing a time-scaled version of the original temporal envelope. Both full-band and sub-band representations of the temporal envelope are considered. In the full-band case, the modified signal is obtained by appropriate selection of its Fourier transform phase. In the sub-band case, using locations of maxima in the sub-band temporal envelopes, the phase of each bandpass signal is formed to preserve "events" in the envelope of the composite signal. The approach is applied to synthetic and actual short-duration acoustic signals consisting of closely-spaced and overlapping sequential time components.
READ LESS

Summary

A new approach is introduced for time-scale modification of short-duration complex acoustic signals to improve their audibility. The technique constrains the modified signal to take on a specified spectral characteristic while imposing a time-scaled version of the original temporal envelope. Both full-band and sub-band representations of the temporal envelope are...

READ MORE

Time-scale modification with temporal envelope invariance

Published in:
Proc. IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 17-20 October 1993, pp. 127-130.

Summary

A new approach is introduced for time-scale modification of short-duration complex acoustic signals to improve their audibility. The method preserves the time-scaled temporal envelope of a signal and for enhancement capitalizes on the perceptual importance of a signal's temporal structure. The basis for the approach is a sub-band representation whose channel phases are controlled to shape the temporal envelope of the time-scaled signal. The phase control is derived from locations of events which occur within filterbank outputs. A frame-based generalization of the method imposes phase consistency across consecutive synthesis frames. The approach is applied to synthetic and actual short-duration acoustic signals consisting of closely-spaced and overlapping sequential time components.
READ LESS

Summary

A new approach is introduced for time-scale modification of short-duration complex acoustic signals to improve their audibility. The method preserves the time-scaled temporal envelope of a signal and for enhancement capitalizes on the perceptual importance of a signal's temporal structure. The basis for the approach is a sub-band representation whose...

READ MORE

Shape invariant time-scale and pitch modification of speech

Published in:
IEEE Trans. Signal Process., Vol. 40, No. 3, March 1992, pp. 497-510.

Summary

The simplified linear model of speech production predicts that when the rate of articulation is changed, the resulting waveform takes on the appearance of the original, except for a change in the time scale. The goal of this paper is to develop a time-scale modification system that preserves this shape-invariance property during voicing. This is done using a version of the sinusoidal analysis-synthesis system that models and independently modifies the phase contributions of the vocal tract and vocal cord excitation. An important property of the system is its capability of performing time-varying rates of change. Extensions of the method are applied to fixed and time-varying pitch modification of speech. The sine-wave analysis-synthesis system also allows for shape-invariant joint time-scale and pitch modification, and allows for the adjustment of the time scale and pitch according to speech characteristics such as the degree of voicing.
READ LESS

Summary

The simplified linear model of speech production predicts that when the rate of articulation is changed, the resulting waveform takes on the appearance of the original, except for a change in the time scale. The goal of this paper is to develop a time-scale modification system that preserves this shape-invariance...

READ MORE

Low-rate speech coding based on the sinusoidal model

Published in:
Chapter 6 in Advances in Speech Signal Processing, Marcel Dekker, Inc., 1992, pp. 165-208.

Summary

One approach to the problem of representation of speech signals is to use the speech production model in which speech is viewed as the result of passing a glottal excitation waveform through a time-varying linear filter that models the resonant characteristics of the vocal tract. In many applications it suffices to assume that the glottal excitation can be in one of two possible states corresponding to voiced or unvoiced speech. In attempts to design high-quality speech coders at the midband rates, generalizations of the binary excitation model have been developed. One such approach is multipulse (Atal and Remde, 1982) which uses more than one pitch pulse to model voiced speech and a possibly random set of pulses to model unvoiced speech. Code excited linear prediction (CELP) (Schroeder and Atal, 1985) is another representation which models the excitation as one of a number of random sequences or "codewords" superimposed on periodic pitch pulses. In this chapter the goal is also to generalize the model for the glottal excitation; but instead of using impulses as in multipulse or random sequences as in CELP, the excitation is assumed to be composed of sinusoidal components of arbitrary amplitudes, frequencies, and phases (McAulay and Quatieri, 1986).
READ LESS

Summary

One approach to the problem of representation of speech signals is to use the speech production model in which speech is viewed as the result of passing a glottal excitation waveform through a time-varying linear filter that models the resonant characteristics of the vocal tract. In many applications it suffices...

READ MORE

Speech nonlinearities, modulations, and energy operators

Published in:
Proc. IEEE Int. Conf. on Acoustics, Speech and Signal Processing, ICASSP, 14-17 May 1991, pp. 421-424.

Summary

In this paper, we investigate an AM-FM model for representing modulations in speech resonances. Specifically, we propose a frequency modulation (FM) model for the time-varying formants whose amplitude varies as the envelope of an amplitude-modulated (AM) signal. To detect the modulations we apply the energy operator (psi)(x) = (x)^2 - xx and its discrete counterpart. We found that psi can approximately track the envelope of AM signals, the instantaneous frequency of FM signals, and the product of these two functions in the general case of AM-FM signals. Several experiments are reported on the applications of this AM-FM modeling to speech signals, bandpass filtered via Gabor filtering.
READ LESS

Summary

In this paper, we investigate an AM-FM model for representing modulations in speech resonances. Specifically, we propose a frequency modulation (FM) model for the time-varying formants whose amplitude varies as the envelope of an amplitude-modulated (AM) signal. To detect the modulations we apply the energy operator (psi)(x) = (x)^2 -...

READ MORE