Phone tokenization followed by n-gram language modeling has consistently provided good results for the task of language identification. In this paper, this technique is generalized by using Gaussian mixture models as the basis for tokenizing. Performance results are presented for a system employing a GMM tokenizer in conjunction with multiple language processing and score combination techniques. On the 1996 CallFriend LID evaluation set, a 12-way closed set error rate of 17% was obtained.

READ LESS

Summary

Language identification using Gaussian mixture model tokenization

Preliminary speaker recognition experiments on the NATO N4 corpus

September 8, 2001

Conference Paper

Author:

Marc A. Zissman

…

Published in:

Proc. Workshop on Multilingual Speech and Language Processing, 8 Spetember 2001.

Topic:

language recognition

R&D area:

Cyber Security and Information Sciences

R&D group:

Artificial Intelligence Technology and Systems

Summary

The NATO N4 corpus contains speech collected at naval training schools within several NATO countries. The speech utterances comprising the corpus are short, tactical transmissions typical of NATO naval communications. In this paper, we report the results of some preliminary speaker recognition experiments on the N4 corpus. We compare the performance of three speaker recognition systems developed at TNO Human Factors, the US Air Force Research Laboratory, Information Directorate and MIT Lincoln Laboratory on the segment of N4 data collected in the Netherlands. Performance is reported as a function of both training and test data duration. We also investigate the impact of cross-language training and testing.

READ LESS

Summary

Preliminary speaker recognition experiments on the NATO N4 corpus

Evaluation of confidence measures for language identification

September 5, 1999

Conference Paper

Author:

Kay M. Berkling

…

Published in:

6th European Conf. on Speech Communication and Technology, EUROSPEECH, 5-9 September 1999.

Topic:

language recognition

R&D area:

Cyber Security and Information Sciences

R&D group:

Artificial Intelligence Technology and Systems

Summary

In this paper we examine various ways to derive confidence measures for a language identification system, using phone recognition followed by language models, and describe the application of an evaluation metric for measuring the "goodness" of the different confidence measures. Experiments are conducted on the 1996 NIST Language Identification Evaluation corpus (derived from the Callfriend corpus of conversational telephone speech). The system is trained on the NIST 96 development data and evaluated on the NIST 96 evaluation data. Results indicate that we are able to predict the performance of a system and quantitatively evaluate how well the prediction holds on new data.

READ LESS

Summary

Evaluation of confidence measures for language identification

Blind clustering of speech utterances based on speaker and language characteristics

November 30, 1998

Conference Paper

Author:

Douglas A. Reynolds

…

Published in:

5th Int. Conf. Spoken Language Processing (ICSLP), 30 November - 4 December 1998.

Topic:

speaker recognition

R&D area:

Cyber Security and Information Sciences

R&D group:

Artificial Intelligence Technology and Systems

Summary

Classical speaker and language recognition techniques can be applied to the classification of unknown utterances by computing the likelihoods of the utterances given a set of well trained target models. This paper addresses the problem of grouping unknown utterances when no information is available regarding the speaker or language classes or even the total number of classes. Approaches to blind message clustering are presented based on conventional hierarchical clustering techniques and an integrated cluster generation and selection method called the d* algorithm. Results are presented using message sets derived from the Switchboard and Callfriend corpora. Potential applications include automatic indexing of recorded speech corpora by speaker/language tags and automatic or semiautomatic selection of speaker specific speech utterances for speaker recognition adaptation.

READ LESS

Summary

Blind clustering of speech utterances based on speaker and language characteristics

Improving accent identification through knowledge of English syllable structure

November 30, 1998

Conference Paper

Author:

Kay M. Berkling

…

Published in:

5th Int. Conf. on Spoken Language Processing, ICSLP, 30 November - 4 December 1998.

Topic:

language recognition

R&D area:

Cyber Security and Information Sciences

R&D group:

Artificial Intelligence Technology and Systems

Summary

This paper studies the structure of foreign-accented read English speech. A system for accent identification is constructed by combining linguistic theory with statistical analysis. Results demonstrate that the linguistic theory is reflected in real speech data and its application improves accent identification. The work discussed here combines and applies previous research in language identification based on phonemic features [1] with the analysis of the structure and function of the English language [2]. Working with phonemically hand-labelled data in three accented speaker groups of Australian English (Vietnamese, Lebanese, and native speakers), we show that accents of foreign speakers can be predicted and manifest themselves differently as a function of their position within the syllable. When applying this knowledge, English vs. Vietnamese accent identification improves from 86% to 93% (English vs. Lebanese improves from 78% to 84%). The described algorithm is also applied to automatically aligned phonemes.

READ LESS

Summary

Improving accent identification through knowledge of English syllable structure

Predicting, diagnosing, and improving automatic language identification performance

September 22, 1997

Conference Paper

Author:

Marc A. Zissman

Published in:

5th European Conf. on Speech Communication and Technology, EUROSPEECH, 22-25 September 1997.

Topic:

language recognition

R&D area:

Cyber Security and Information Sciences

R&D group:

Artificial Intelligence Technology and Systems

Summary

Language-identification (LID) techniques that use multiple single-language phoneme recognizers followed by n-gram language models have consistently yielded top performance at NIST evaluations. In our study of such systems, we have recently cut our LID error rate by modeling the output of n-gram language models more carefully. Additionally, we are now able to produce meaningful confidence scores along with our LID hypotheses. Finally, we have developed some diagnostic measures that can predict performance of our LID algorithms.

READ LESS

Summary

Predicting, diagnosing, and improving automatic language identification performance

Automatic dialect identification of extemporaneous, conversational, Latin American Spanish Speech

May 7, 1996

Conference Paper

Author:

Marc A. Zissman

…

Published in:

Proc. IEEE Int. Conf. on Acoustics, Speech and Signal Processing, Vol. 2, ICASSP, 7-10 May 1996, pp. 777-780.

Topic:

language recognition

R&D area:

Cyber Security and Information Sciences

R&D group:

Artificial Intelligence Technology and Systems

Summary

A dialect identification technique is described that takes as input extemporaneous, conversational speech spoken in Latin American Spanish and produces as output a hypothesis of the dialect. The system has been trained to recognize Cuban and Peruvian dialects of Spanish, but could be extended easily to other dialects (and languages) as well. Building on our experience in automatic language identification, the dialect-ID system uses an English phone recognizer trained on the TIMIT corpus to tokenize training speech spoken in each Spanish dialect. Phonotactic language models generated from this tokenized training speech are used during testing to compute dialect likelihoods for each unknown message. This system has an error rate of 16% on the Cuban/Peruvian two-alternative forced-choice test. We introduce the new "Miami" Latin American Spanish speech corpus that is capable of supporting our research into the future.

READ LESS

Summary

Automatic dialect identification of extemporaneous, conversational, Latin American Spanish Speech

Comparison of four approaches to automatic language identification of telephone speech

January 1, 1996

Journal Article

Author:

Marc A. Zissman

Published in:

IEEE Trans. Speech Audio Process., Vol. 4, No. 1, January 1996, pp. 31-44.

Topic:

language recognition

R&D area:

Cyber Security and Information Sciences

R&D group:

Artificial Intelligence Technology and Systems

Summary

We have compared the performance of four approaches for automatic language identification of speech utterances: Gaussian mixture model (GMM) classification; single-language phone recognition followed by language-dependent, interpolated n-gram language modeling (PRLM); parallel PRLM, which uses multiple single-language phone recognizers, each trained in a different language; and language dependent parallel phone recognition (PPR). These approaches which space a wide range of training requirements and levels of recognition complexity, were evaluated with the Oregon Graduate Institute Multi-Language Telephone Speech Corpus. Systems containing phone recognizers performed better than the simpler GMM classifier. The top-performing system was parallel PRLM, which exhibited an error rate of 2% for 45-s utterances and 5% for 10-s utterances in two-language, closed-set, forced-choice classification. The error rate for 11-language, closed-set, forced-choice classification was 11% for 45-s utterances and 21% for 10-s utterances.

READ LESS

Summary

Comparison of four approaches to automatic language identification of telephone speech

Language identification using phoneme recognition and phonotactic language modeling

May 9, 1995

Conference Paper

Author:

Marc A. Zissman

Published in:

Proc. IEEE Int. Conf. on Acoustics, Speech and Signal Processing, Vol. 5, ICASSP, 9-12 May 1995, pp. 3503-3506.

Topic:

language recognition

R&D area:

Cyber Security and Information Sciences

R&D group:

Artificial Intelligence Technology and Systems

Summary

A language identification technique using multiple single-language phoneme recognizers followed by n-gram language models yielded to performance at the March 1994 NIST language identification evaluation. Since the NIST evaluation, work has been aimed at further improving performance by using the acoustic likelihoods emitted from gender-dependent phoneme recognizers to weight the phonotactic likelihoods output from gender-dependent language models. We have investigated the effect of restricting processing to the most highly discriminating n-grams, and we have also added explicit duration modeling at the phonotactic level. On the OGI Multi-language Telephone Speech Corpus, accuracy on an 11-language identification task has risen to 89% on 45-s utterances and 79% on 10-s utterances. Two-language classification accuracy is 98% and 95% for the 45-s and 10-s utterance, respectively. Finally, we have started to apply these same techniques to the problem of dialect identification.

READ LESS

Summary

Language identification using phoneme recognition and phonotactic language modeling

Automatic language identification of telephone speech messages using phoneme recognition and N-gram modeling

April 19, 1994

Conference Paper

Author:

Marc A. Zissman

…

Elliot Singer

Published in:

Proc. IEEE Int. Conf. on Acoustics, Speech and Signal Processing, ICASSP, Vol. 1, Speech Processing, 19-22 April 1994, pp. 305-308.

Topic:

language recognition

R&D area:

Cyber Security and Information Sciences

R&D group:

Artificial Intelligence Technology and Systems

Summary

This paper compares the performance of four approaches to automatic language identification (LID) of telephone speech messages: Gaussian mixture model classification (GMM), language-independent phoneme recognition followed by language-dependent language modeling (PRLM), parallel PRLM (PRLM-P), and language-dependent parallel phoneme recognition (PPR). These approaches span a wide range of training requirements and levels of recognition complexity. All approaches were tested on the development test subset of the OGI multi-language telephone speech corpus. Generally, system performance was directly related to system complexity, with PRLM-P and PPR performing best. On 45 second test utterance, average two language, closed-set, forced-choice classification performance, reached 94.5% correct. The best 10 language, closed-set, forced-choice performance was 79.2% correct.

READ LESS

Summary

Automatic language identification of telephone speech messages using phoneme recognition and N-gram modeling

Publications

Refine Results

Tagged As

Summary

Summary

Preliminary speaker recognition experiments on the NATO N4 corpus

Summary

Summary

Summary

Summary

Summary

Summary

Summary

Summary

Summary

Summary

Summary

Summary

Summary

Summary

Summary

Summary

Summary

Summary

Showing Results