Publications
Predicting, diagnosing, and improving automatic language identification performance
Summary
Summary
Language-identification (LID) techniques that use multiple single-language phoneme recognizers followed by n-gram language models have consistently yielded top performance at NIST evaluations. In our study of such systems, we have recently cut our LID error rate by modeling the output of n-gram language models more carefully. Additionally, we are now...
Automatic dialect identification of extemporaneous, conversational, Latin American Spanish Speech
Summary
Summary
A dialect identification technique is described that takes as input extemporaneous, conversational speech spoken in Latin American Spanish and produces as output a hypothesis of the dialect. The system has been trained to recognize Cuban and Peruvian dialects of Spanish, but could be extended easily to other dialects (and languages)...
Comparison of four approaches to automatic language identification of telephone speech
Summary
Summary
We have compared the performance of four approaches for automatic language identification of speech utterances: Gaussian mixture model (GMM) classification; single-language phone recognition followed by language-dependent, interpolated n-gram language modeling (PRLM); parallel PRLM, which uses multiple single-language phone recognizers, each trained in a different language; and language dependent parallel phone...
Language identification using phoneme recognition and phonotactic language modeling
Summary
Summary
A language identification technique using multiple single-language phoneme recognizers followed by n-gram language models yielded to performance at the March 1994 NIST language identification evaluation. Since the NIST evaluation, work has been aimed at further improving performance by using the acoustic likelihoods emitted from gender-dependent phoneme recognizers to weight the...
The effects of telephone transmission degradations on speaker recognition performance
Summary
Summary
The two largest factors affecting automatic speaker identification performance are the size of the population an the degradations introduced by noisy communication, channels (e.g., telephone transmission). To examine experimentally these two factors, this paper presents text-independent speaker identification results for varying speaker population sizes up to 630 speakers for both...
Automatic language identification of telephone speech messages using phoneme recognition and N-gram modeling
Summary
Summary
This paper compares the performance of four approaches to automatic language identification (LID) of telephone speech messages: Gaussian mixture model classification (GMM), language-independent phoneme recognition followed by language-dependent language modeling (PRLM), parallel PRLM (PRLM-P), and language-dependent parallel phoneme recognition (PPR). These approaches span a wide range of training requirements and...
Digital signal processing applications in cochlear-implant research
Summary
Summary
We have developed a facility that enables scientists to investigate a wide range of sound-processing schemes for human subjects with cochlear implants. This digital signal processing (DSP) facility-named the Programmable Interactive System for Cochlear Implant Electrode Stimulation (PISCES)-was designed, built, and tested at Lincoln Laboratory and then installed at the...
Automatic language identification using Gaussian mixture and hidden Markov models
Summary
Summary
Ergodic, continuous-observation, hidden Markov models (HMMs) were used to perform automatic language classification and detection of speech messages. State observation probability densities were modeled as tied Gaussian mixtures. The algorithm was evaluated on four multilanguage speech databases: a three language subset of the Spoken Language Library, a three language subset...
Two-talker pitch tracking for co-channel talker interference suppression
Summary
Summary
Almost all co-channel talker interference suppression systems use the difference in the pitches of the target and jammer speakers to suppress the jammer and enhance the target. While joint pitch estimators outputting two pitch estimates as a function of time have been proposed, the task of proper assignment of pitch...
Automatic talker activity labeling for co-channel talker interference suppression
Summary
Summary
This paper describes a speaker activity detector taking co-channel speech as input and labeling intervals of the input as target-only, jammer-only, or two-speaker (target+jammer). The algorithms applied were borrowed primarily from speaker recognition, thereby allowing us to use speaker-dependent test-utterance-independent information in a front-end for co-channel talker interference suppression. Parameters...