Language recognition with discriminative keyword selection

March 31, 2008

Conference Paper

Author:

Frederick S. Richardson

…

William M. Campbell

Published in:

Proc. IEEE Int. Conf. on Acoustics, Speech and Signal Processing, ICASSP, 31 March - 4 April 2008, pp. 4145-4148.

R&D Area:

Cyber Security and Information Sciences

R&D Group:

Artificial Intelligence Technology and Systems

Language recognition with discriminative keyword selection

Summary

One commonly used approach for language recognition is to convert the input speech into a sequence of tokens such as words or phones and then to use these token sequences to determine the target language. The language classification is typically performed by extracting N-gram statistics from the token sequences and then using an N-gram language model or support vector machine (SVM) to perform the classification. One problem with these approaches is that the number of N-grams grows exponentially as the order N is increased. This is especially problematic for an SVM classifier as each utterance is represented as a distinct N-gram vector. In this paper we propose a novel approach for modeling higher order Ngrams using an SVM via an alternating filter-wrapper feature selection method. We demonstrate the effectiveness of this technique on the NIST 2007 language recognition task.

Tagged As