Publications

Refine Results

(Filters Applied) Clear All

Speech transformations based on a sinusoidal representation

Published in:
IEEE Trans. Acoust. Speech Signal Process., Vol. ASSP-34, No. 6, December 1986, pp. 1449-1464.

Summary

In this paper a new speech analysis/synthesis technique is presented which provides the basis for a general class of speech transformations including time-scale modification, frequency scaling, and pitch modification. These modifications can be performed with a time-varying change, permitting continuous adjustment of a speaker's fundamental frequency rate of articulation. The method is based on a sinusoidal representation of the speech production mechanism which has been shown to produce synthetic speech that preserves the waveform shape and is perceptually indistinguishable from the original. Although the analysis/synthesis system was originally designed for single speaker signals, it is also capable ot recovering and modifying non-speech signals such as music, multiple speakers, marine biologic sounds, and speakers in the presence of interferences such as noise and musical backgrounds.
READ LESS

Summary

In this paper a new speech analysis/synthesis technique is presented which provides the basis for a general class of speech transformations including time-scale modification, frequency scaling, and pitch modification. These modifications can be performed with a time-varying change, permitting continuous adjustment of a speaker's fundamental frequency rate of articulation. The...

READ MORE

Speech analysis/synthesis based on a sinusoidal representation

Published in:
IEEE Trans. Acoust. Speech Signal Process., Vol. ASSP-34, No. 4, August 1986, pp. 744-754.

Summary

A sinusoidal model for the speech waveform is used to develop a new analysis/synthesis technique that is characterized by the amplitudes, frequencies, and phases of the component sine waves. These parameters are estimated from the short-time Fourier transform using a simple peak-picking algorithm. Rapid changes in the highly resolved spectral components are tracked using the concept of "birth" and "death" of the underlying sine waves. For a given frequency track a cubic function is used to unwrap and interpolate the phase such that the phase track is maximally smooth. This phase function is applied to a sine-wave generator, which is amplitude modulated and added to the other sine waves to give the final speech output. The resulting synthetic waveform preserves the general waveform shape and is essentially perceptually indistinguishable from the original speech. Furthermore, in the presence of noise the perceptual characteristics of the speech as well as the noise are maintained. In addition, it was found that the representation was sufficiently general that high-quality reproduction was obtained for a larger class of inputs including: two overpallping, superposed speech waveforms; music waveforms; speech in musical backgrounds; and certain marine biologic sounds. Finally, the analysis/synthesis system forms the basis for new approaches to the problems of speech transformations including time-scale and pitch-scale modification, and midrate speech coding.
READ LESS

Summary

A sinusoidal model for the speech waveform is used to develop a new analysis/synthesis technique that is characterized by the amplitudes, frequencies, and phases of the component sine waves. These parameters are estimated from the short-time Fourier transform using a simple peak-picking algorithm. Rapid changes in the highly resolved spectral...

READ MORE

Robust HMM-based techniques for recognition of speech produced under stress and in noise

Published in:
Proc. Speech Tech '86, 28-30 April 1986, pp. 241-249.

Summary

Substantial improvements in speech recognition performance on speech produced under stress and in noise have been achieved through the development of techniques for enhancing the robustness of a base-line isolated-word Hidden Markov Model recognizer. The baseline HMM is a continuous-observation system using mel-frequency cepstra as the observation parameters. Enhancement techniques which were developed and tested include: placing a lower limit on the estimated variances of the observations; addition of temporal difference parameters; improved duration modelling; use of fixed diagonal covariance distance functions, with variances adjusted according to perceptual considerations; cepstral domain stress compensation; and multi-style training, where the system is trained on speech spoken with a variety of talking styles. With perceptually-motivated covariance and a combination of normal (single-frame) and differential cepstral observations, average error rates over five simulated-stress conditions were reduced from 20% (baseline) to 2.5% on a simulated-stress data base (105-word vocabulary, eight talkers, five conditions). With variance limiting, normal plus differential observations, and multi-style training, an error rate of 1.8% was achieved. Additional tests were conducted on a data base including nine talkers, eight talking styles, with speech produced under two levels of motor-workload stress. Substantial reductions in error rate were demonstrated for the noise and workload conditions, when multiple talking styles, rather than only normal speech, were used in training. In experiments conducted in simulated fighter cockpit noise, it was shown that error rates could be reduced significantly by training under multiple noise exposure conditions.
READ LESS

Summary

Substantial improvements in speech recognition performance on speech produced under stress and in noise have been achieved through the development of techniques for enhancing the robustness of a base-line isolated-word Hidden Markov Model recognizer. The baseline HMM is a continuous-observation system using mel-frequency cepstra as the observation parameters. Enhancement techniques...

READ MORE

Adaptive noise cancellation in a fighter cockpit environment

Published in:
ICASSP'84, IEEE Int. Conf. on Acoustics, Speech and Signal Processing, 19-21 March 1984.

Summary

In this paper we discuss some preliminary results on using Widrow's Adaptive Noise Cancelling (ANC) algorithm to reduce the background noise present in a fighter pilot's speech. With a dominant noise source present and with the pilot wearing an oxygen facemask, we demonstrate that good (>10 dB) cancellation of the additive noise and little speech distortion can be achieved by having the reference microphone attached to the outside of the facemask and by updating the filter coefficients only during silence intervals.
READ LESS

Summary

In this paper we discuss some preliminary results on using Widrow's Adaptive Noise Cancelling (ANC) algorithm to reduce the background noise present in a fighter pilot's speech. With a dominant noise source present and with the pilot wearing an oxygen facemask, we demonstrate that good (>10 dB) cancellation of the...

READ MORE

Experience with speech communication in packet networks

Published in:
IEEE J. Sel. Areas Commun., Vol. SAC-1, No. 6, December 1983, pp. 963-980.

Summary

The integration of digital voice with data in a common packet-switched network system offers a number of potential benefits, including reduced systems cost through sharing of switching and transmission resources, flexible internetworking among systems utilizing different transmission media, and enhanced services for users requiring access to both voice and data communications. Issues which it has been necessary to address in order to realize these benefits include reconstitution of speech from packets arriving at nonuniform intervals, maximization of packet speech multiplexing efficiency, and determination of the implementation requirements for terminals and switching in a large-scale packet voice/data system. A series of packet speech systems experiments to address these issues has been conducted under the sponsorship of the Defense Advanced Research Projects Agency (DARPA). In the initial experiments on the ARPANET, the basic feasibility of speech communication on a store-and-forward packet network was demonstrated. Techniques were developed for reconstitution of speech from packets, and protocols were developed for call setup and for speech transport. Later speech experiments utilizing the Atlantic packet satellite network (SATNET) led to the development of techniques for efficient voice conferencing in a broadcast environment, and for internetting speech between a store-and-forward net (ARPANET) and a broadcast net (SATNET). Large-scale packet speech multiplexing experiments could not be carried out on ARPANET or SATNET where the network link capacities severely restrict the number of speech users that can be accommodated. However, experiments are currently being carried out using a wide-band satellite-based packet system designed to accommodate a sufficient number of simultaneous users to support realistic experiments in efficient statistical multiplexing. Key developments to date associated with the wide-band experiments have been 1) techniques for internetting via voice/data gateways from a variety of local access networks (packet cable, packet radio, and circuit-switched) to a long-haul broadcast satellite network and 2) compact implementations of packet voice terminals with full protocol and voice capabilities. Basic concepts and issues associated with packet speech systems are described. Requirements and techniques for speech processing, voice protocols, packetization and reconstitution, conferencing, and multiplexing are discussed in the context of a generic packet speech system configuration. Specific experimental configurations and key packet speech results on the ARPANET, SATNET, and wide-band system are reviewed.
READ LESS

Summary

The integration of digital voice with data in a common packet-switched network system offers a number of potential benefits, including reduced systems cost through sharing of switching and transmission resources, flexible internetworking among systems utilizing different transmission media, and enhanced services for users requiring access to both voice and data...

READ MORE

Frequency sampling of the short-time Fourier-transform magnitude for signal reconstruction

Published in:
J. Opt. Soc. Amer., Vol. 73, November 1983, pp. 1523- 1526.

Summary

Unique recovery of a signal from the magnitude (modulus) of the Fourier transform has been of long-standing interest in image and optical processing in which Fourier-transform phase is lost or difficult to measure. We investigate an alternative problem of recovering a signal from the Fourier-transform magnitude of overlapping regions of the signal, i.e., from the short-time (or -space) Fourier-transform magnitude. Recently it was established that a discrete-time signal x (n) can be uniquely obtained under mild restrictions from its short-time Fourier-transform magnitude. In this paper we extend this result to the case when the short-time Fourier-transform magnitude is known at only one or two frequencies for each n. We also present a recursive algorithm for recovering a sequence from such samples and demonstrate the algorithm with an example.
READ LESS

Summary

Unique recovery of a signal from the magnitude (modulus) of the Fourier transform has been of long-standing interest in image and optical processing in which Fourier-transform phase is lost or difficult to measure. We investigate an alternative problem of recovering a signal from the Fourier-transform magnitude of overlapping regions of...

READ MORE

The Experimental Integrated Switched Network - a system-level network test facility

Published in:
Proc. 1983 IEEE Military Communications Conf., MILCOM, 31 October-2 November 1983.

Summary

An Experimental Integrated Switched Network (EISN) has been developed to provide a system-level testbed for the evaluation of advanced communications networking techniques, including survivable network routing algorithms using a mix of transmission media, for application in the Defense Switched Network (DSN). EISN includes five CONUS sites linked by a wideband demand-assigned satellite channel and by dialed-up terrestrial trunks for alternate satellite/terrestrial routing experiments. Experiments to date have validated techniques for integration of circuit-switched terrestrial systems with the demand-assigned satellite system, and for the establishment of alternate routes over satellite and terrestrial paths. Currently, candidate routing algorithms for application in the DSN are being implemented and tested using external routing/controller processors attached to digital circuit switches at EISN sites. In addition, EISN is also being used to support data communication experiments using DoD standard data protocols in a combined satellite/terrestrial network environment. Work is ongoing both in system experiments and in testbed developments to include additional capabilities. This paper represents a description and status report on both the testbed and the experimental efforts.
READ LESS

Summary

An Experimental Integrated Switched Network (EISN) has been developed to provide a system-level testbed for the evaluation of advanced communications networking techniques, including survivable network routing algorithms using a mix of transmission media, for application in the Defense Switched Network (DSN). EISN includes five CONUS sites linked by a wideband...

READ MORE

Object detection by two-dimensional linear prediction

Published in:
MIT Lincoln Laboratory Report TR-632

Summary

An important component of any automated image analysis system is the detection and classification of objects. In this report, we consider the first of these problems where the specific goal is to detect anomalous areas (e.g., man-made objects) in textured backgrounds such as trees, grass, and fields of aerial photographs. Our detection algorithm relies on a significance test which adapts itself to the changing background in such a way that a constant false alarm rate is maintained. Furthermore, this test has a potentially practical implementation since it can be expressed in terms of the residuals of an adaptive two-dimensional linear predictor. The algorithm is demonstrated with both synthetic and realworld images.
READ LESS

Summary

An important component of any automated image analysis system is the detection and classification of objects. In this report, we consider the first of these problems where the specific goal is to detect anomalous areas (e.g., man-made objects) in textured backgrounds such as trees, grass, and fields of aerial photographs...

READ MORE

Implementation of 2-D digital filters by iterative methods

Published in:
IEEE Trans. Acoust. Speech Signal Process., Vol. ASSP-30, No. 3, June 1982, pp. 473-87.

Summary

A two-dimensional (2-D) rational filter can be implemented by an iterative computation involving only finite-extent impulse response (FIR) filtering operations, provided a certain convergence criterion is met. In this paper, we generalize this procedure so that the convergence criterion is satisfied for any stable 2-D rational transfer function. One formulation which guarantees convergence invokes a relaxed form of the iterative computation along with prefiltering the numerator and denominator polynomials of the rational transfer function. This implementation may be applied with a frequency-varying relaxation parameter for increasing the rate of convergence. An alternative generalization uses several previously computed iterates, unlike our first modification which utilizes only the most recently computed iterate. This formulation can potentially guarantee convergence and also increase the convergence rate without the requirement of prefiltering. Another extension of the iterative computation incorporates constraints (e.g., positivity or finite extent) on the output of each iteration. Proof of convergence of such constrained iterations relies on the concept of a nonexpansive operator. In particular, the error introduced within the converging solution resulting from a finite-extent constraint is shown to satisfy a homogeneous partial difference equation. Finally, this error computation leads to an important link between our iterative implementation with constraints and an iterative solution to partial difference equations (e.g., Laplace's equation) with known boundary conditions.
READ LESS

Summary

A two-dimensional (2-D) rational filter can be implemented by an iterative computation involving only finite-extent impulse response (FIR) filtering operations, provided a certain convergence criterion is met. In this paper, we generalize this procedure so that the convergence criterion is satisfied for any stable 2-D rational transfer function. One formulation...

READ MORE

Signal reconstruction from the short-time Fourier transform magnitude

Published in:
IEEE-ASSP Int. Conf., 2 May 1982.

Summary

In this paper, a signal is shown to be uniquely represented by the magnitude of its short-time Fourier transform (STFT) under mild restrictions on the signal and the analysis window of the STFT. Furthermore, various algorithms are developed which reconstruct signal from appropriate samples of the STFT magnitude. Several of the algorithms can also be used to obtain signal estimates from the processed STFT magnitude, which generally does not have a valid short-time structure. These algorithms are successfully applied to the time-scale modification and noise reduction problems in speech processing. Finally, the results presented here have similar potential for other applications areas, including those with multidimensional signals.
READ LESS

Summary

In this paper, a signal is shown to be uniquely represented by the magnitude of its short-time Fourier transform (STFT) under mild restrictions on the signal and the analysis window of the STFT. Furthermore, various algorithms are developed which reconstruct signal from appropriate samples of the STFT magnitude. Several of...

READ MORE