Predicting, diagnosing, and improving automatic language identification performance

September 22, 1997

Conference Paper

Author:

Marc A. Zissman

Published in:

5th European Conf. on Speech Communication and Technology, EUROSPEECH, 22-25 September 1997.

R&D Area:

Cyber Security and Information Sciences

R&D Group:

Artificial Intelligence Technology and Systems

Predicting, diagnosing, and improving automatic language identification performance

Summary

Language-identification (LID) techniques that use multiple single-language phoneme recognizers followed by n-gram language models have consistently yielded top performance at NIST evaluations. In our study of such systems, we have recently cut our LID error rate by modeling the output of n-gram language models more carefully. Additionally, we are now able to produce meaningful confidence scores along with our LID hypotheses. Finally, we have developed some diagnostic measures that can predict performance of our LID algorithms.