Publications

Refine Results

(Filters Applied) Clear All

Low-rate speech coding based on the sinusoidal model

Published in:
Chapter 6 in Advances in Speech Signal Processing, Marcel Dekker, Inc., 1992, pp. 165-208.

Summary

One approach to the problem of representation of speech signals is to use the speech production model in which speech is viewed as the result of passing a glottal excitation waveform through a time-varying linear filter that models the resonant characteristics of the vocal tract. In many applications it suffices to assume that the glottal excitation can be in one of two possible states corresponding to voiced or unvoiced speech. In attempts to design high-quality speech coders at the midband rates, generalizations of the binary excitation model have been developed. One such approach is multipulse (Atal and Remde, 1982) which uses more than one pitch pulse to model voiced speech and a possibly random set of pulses to model unvoiced speech. Code excited linear prediction (CELP) (Schroeder and Atal, 1985) is another representation which models the excitation as one of a number of random sequences or "codewords" superimposed on periodic pitch pulses. In this chapter the goal is also to generalize the model for the glottal excitation; but instead of using impulses as in multipulse or random sequences as in CELP, the excitation is assumed to be composed of sinusoidal components of arbitrary amplitudes, frequencies, and phases (McAulay and Quatieri, 1986).
READ LESS

Summary

One approach to the problem of representation of speech signals is to use the speech production model in which speech is viewed as the result of passing a glottal excitation waveform through a time-varying linear filter that models the resonant characteristics of the vocal tract. In many applications it suffices...

READ MORE

Speech analysis/synthesis based on a sinusoidal representation

Published in:
IEEE Trans. Acoust. Speech Signal Process., Vol. ASSP-34, No. 4, August 1986, pp. 744-754.

Summary

A sinusoidal model for the speech waveform is used to develop a new analysis/synthesis technique that is characterized by the amplitudes, frequencies, and phases of the component sine waves. These parameters are estimated from the short-time Fourier transform using a simple peak-picking algorithm. Rapid changes in the highly resolved spectral components are tracked using the concept of "birth" and "death" of the underlying sine waves. For a given frequency track a cubic function is used to unwrap and interpolate the phase such that the phase track is maximally smooth. This phase function is applied to a sine-wave generator, which is amplitude modulated and added to the other sine waves to give the final speech output. The resulting synthetic waveform preserves the general waveform shape and is essentially perceptually indistinguishable from the original speech. Furthermore, in the presence of noise the perceptual characteristics of the speech as well as the noise are maintained. In addition, it was found that the representation was sufficiently general that high-quality reproduction was obtained for a larger class of inputs including: two overpallping, superposed speech waveforms; music waveforms; speech in musical backgrounds; and certain marine biologic sounds. Finally, the analysis/synthesis system forms the basis for new approaches to the problems of speech transformations including time-scale and pitch-scale modification, and midrate speech coding.
READ LESS

Summary

A sinusoidal model for the speech waveform is used to develop a new analysis/synthesis technique that is characterized by the amplitudes, frequencies, and phases of the component sine waves. These parameters are estimated from the short-time Fourier transform using a simple peak-picking algorithm. Rapid changes in the highly resolved spectral...

READ MORE

A linear prediction vocoder with voice excitation

Published in:
Proc. EASCON, 29 September - 1 October 1975, pp. 30-a-30-g.

Summary

A speech bandwidth compression system, which employs voice excitation in conjunction with a Linear Predictive Coding (LPC) parameterization of the vocal tract filter, is described. To generate the excitation signal, the transmitted speech baseband is broadened at the receiver with a nonlinear distorter, and spectrally flattened by means of an adaptive inverse filter whose parameters are obtained through LPC analysis of the distorted baseband. The voice-excited linear prediction (VELP) system has been implemented in real time on the Fast Digital Processor at Lincoln Laboratory. A detailed description of an 8 kbps version of VELP is given. VELP offers promise as a good quality, medium rate speech compression system which, by avoiding the pitch problem, performs relatively well for telephone quality input speech.
READ LESS

Summary

A speech bandwidth compression system, which employs voice excitation in conjunction with a Linear Predictive Coding (LPC) parameterization of the vocal tract filter, is described. To generate the excitation signal, the transmitted speech baseband is broadened at the receiver with a nonlinear distorter, and spectrally flattened by means of an...

READ MORE

Predictive coding in a homomorphic vocoder

Published in:
IEEE Trans. Audio Electroacoust., Vol. AU-19, No. 3 September 1971, pp. 243-248.

Summary

Application of a type of predictive coding to the channel signals of a homomorphic vocoder has produced sizable bit rate reductions. With only slight degradation in speech quality, reduction (for the spectral envelope information) from 7800 to 4000 bits/s was achieved. A technique for obtaining the formant frequencies from the predictive coding parameters is described; this approach promises further bit rate reductions. As a byproduct of this study of predictive coding, direct and cascade form speech synthesizers are compared on the basis of differing quantization effects.
READ LESS

Summary

Application of a type of predictive coding to the channel signals of a homomorphic vocoder has produced sizable bit rate reductions. With only slight degradation in speech quality, reduction (for the spectral envelope information) from 7800 to 4000 bits/s was achieved. A technique for obtaining the formant frequencies from the...

READ MORE