Publications
Recognition by humans and machines: miles to go before we sleep
Summary
Summary
Bourlard and his colleagues note that much effort over the past few years has focused on creating large-vocabulary speech recognition systems and reducing error rates measured using clean speech materials. This has led to experimental talker-independent systems with vocabularies of 65,000 words capable of transcribing sentences on a limited set...
Comparison of four approaches to automatic language identification of telephone speech
Summary
Summary
We have compared the performance of four approaches for automatic language identification of speech utterances: Gaussian mixture model (GMM) classification; single-language phone recognition followed by language-dependent, interpolated n-gram language modeling (PRLM); parallel PRLM, which uses multiple single-language phone recognizers, each trained in a different language; and language dependent parallel phone...
A subband approach to time-scale expansion of complex acoustic signals
Summary
Summary
A new approach to time-scale expansion of short-duration complex acoustic signals is introduced. Using a subband signal representation, channel phases are selected to preserve a desired time-scaled temporal envelope. The phase representation is derived from locations of events that occur within filter bank outputs. A frame-based generalization of the method...
Time-scale modification with inconsistent constraints
Summary
Summary
A set theoretic estimation approach is introduced for timescale modification of complex acoustic signals. The method determines a signal that meets, in a least-squared error sense, desired temporal and spectral envelope constraints that are inconsistent. These constraints are generalized within the set theoretic framework to include other signal characteristics such...
Military and government applications of human-machine communication by voice
Summary
Summary
This paper describes a range of opportunities for military and government applications of human-machine communication by voice, based on visits and contacts with numerous user organizations in the United States. The applications include some that appear to be feasible by careful integration of current state-of-the-art technology and others that will...
Sine-wave amplitude coding using a mixed LSF/PARCOR representation
Summary
Summary
An all-pole model of the speech spectral envelope is used to code the sine-wave amplitudes in the Sinusoidal Transform Coder. While line spectral frequencies (LSFs) are currently used to represent this all-pole model, it is shown that a mixture of line spectral frequencies and partial correlation (PARCOR) coefficients can be...
A comparison of signal processing front ends for automatic word recognition
Summary
Summary
This paper compares the word error rate of a speech recognizer using several signal processing front ends based on auditory properties. Front ends were compared with a control mel filter banks (MFB) based cepstral front end in clean speech and with speech degraded by noise and spectral variability, using the...
Measuring fine structure in speech: application to speaker identification
Summary
Summary
The performance of systems for speaker identification (SID) can be quite good with clean speech, though much lower with degraded speech. Thus it is useful to search for new features for SID, particularly features that are robust over a degraded channel. This paper investigates features that are based on amplitude...
Language identification using phoneme recognition and phonotactic language modeling
Summary
Summary
A language identification technique using multiple single-language phoneme recognizers followed by n-gram language models yielded to performance at the March 1994 NIST language identification evaluation. Since the NIST evaluation, work has been aimed at further improving performance by using the acoustic likelihoods emitted from gender-dependent phoneme recognizers to weight the...
The effects of telephone transmission degradations on speaker recognition performance
Summary
Summary
The two largest factors affecting automatic speaker identification performance are the size of the population an the degradations introduced by noisy communication, channels (e.g., telephone transmission). To examine experimentally these two factors, this paper presents text-independent speaker identification results for varying speaker population sizes up to 630 speakers for both...