Publications
A subband approach to time-scale expansion of complex acoustic signals
Summary
Summary
A new approach to time-scale expansion of short-duration complex acoustic signals is introduced. Using a subband signal representation, channel phases are selected to preserve a desired time-scaled temporal envelope. The phase representation is derived from locations of events that occur within filter bank outputs. A frame-based generalization of the method...
Time-scale modification with inconsistent constraints
Summary
Summary
A set theoretic estimation approach is introduced for timescale modification of complex acoustic signals. The method determines a signal that meets, in a least-squared error sense, desired temporal and spectral envelope constraints that are inconsistent. These constraints are generalized within the set theoretic framework to include other signal characteristics such...
Military and government applications of human-machine communication by voice
Summary
Summary
This paper describes a range of opportunities for military and government applications of human-machine communication by voice, based on visits and contacts with numerous user organizations in the United States. The applications include some that appear to be feasible by careful integration of current state-of-the-art technology and others that will...
Sine-wave amplitude coding using a mixed LSF/PARCOR representation
Summary
Summary
An all-pole model of the speech spectral envelope is used to code the sine-wave amplitudes in the Sinusoidal Transform Coder. While line spectral frequencies (LSFs) are currently used to represent this all-pole model, it is shown that a mixture of line spectral frequencies and partial correlation (PARCOR) coefficients can be...
A comparison of signal processing front ends for automatic word recognition
Summary
Summary
This paper compares the word error rate of a speech recognizer using several signal processing front ends based on auditory properties. Front ends were compared with a control mel filter banks (MFB) based cepstral front end in clean speech and with speech degraded by noise and spectral variability, using the...
Measuring fine structure in speech: application to speaker identification
Summary
Summary
The performance of systems for speaker identification (SID) can be quite good with clean speech, though much lower with degraded speech. Thus it is useful to search for new features for SID, particularly features that are robust over a degraded channel. This paper investigates features that are based on amplitude...
Language identification using phoneme recognition and phonotactic language modeling
Summary
Summary
A language identification technique using multiple single-language phoneme recognizers followed by n-gram language models yielded to performance at the March 1994 NIST language identification evaluation. Since the NIST evaluation, work has been aimed at further improving performance by using the acoustic likelihoods emitted from gender-dependent phoneme recognizers to weight the...
The effects of telephone transmission degradations on speaker recognition performance
Summary
Summary
The two largest factors affecting automatic speaker identification performance are the size of the population an the degradations introduced by noisy communication, channels (e.g., telephone transmission). To examine experimentally these two factors, this paper presents text-independent speaker identification results for varying speaker population sizes up to 630 speakers for both...
Large population speaker identification using clean and telephone speech
Summary
Summary
This paper presents text-independent speaker identification results for varying speaker population sizes up to 630 speakers for both clean, wideband speech, and telephone speech. A system based on Gaussian mixture speaker models is used for speaker identification, and experiments are conducted on the TIMIT and NTIMIT databases. The TIMIT results...
Robust text-independent speaker identification using Gaussian mixture speaker models
Summary
Summary
This paper introduces and motivates the use of Gaussian mixture models (GMM) for robust text-independent speaker identification. The individual Gaussian components of a GMM are shown to represent some general speaker-dependent spectral shapes that are effective for modeling speaker identify. The focus of this work is on applications which require...