Publications

Refine Results

(Filters Applied) Clear All

A hybrid SVM/MCE training approach for vector space topic identification of spoken audio recordings

Published in:
INTERSPEECH 2008, 22-26 September 2008, pp. 2542-2545.

Summary

The success of support vector machines (SVMs) for classification problems is often dependent on an appropriate normalization of the input feature space. This is particularly true in topic identification, where the relative contribution of the common but uninformative function words can overpower the contribution of the rare but informative content words in the SVM kernel function score if the feature space is not normalized properly. In this paper we apply the discriminative minimum classification error (MCE) training approach to the problem of learning an appropriate feature space normalization for use with an SVM classifier. Results are presented showing significant error rate reductions for an SVM-based system on a topic identification task using the Fisher corpus of audio recordings of human conversations.
READ LESS

Summary

The success of support vector machines (SVMs) for classification problems is often dependent on an appropriate normalization of the input feature space. This is particularly true in topic identification, where the relative contribution of the common but uninformative function words can overpower the contribution of the rare but informative content...

READ MORE

Language recognition with discriminative keyword selection

Published in:
Proc. IEEE Int. Conf. on Acoustics, Speech and Signal Processing, ICASSP, 31 March - 4 April 2008, pp. 4145-4148.

Summary

One commonly used approach for language recognition is to convert the input speech into a sequence of tokens such as words or phones and then to use these token sequences to determine the target language. The language classification is typically performed by extracting N-gram statistics from the token sequences and then using an N-gram language model or support vector machine (SVM) to perform the classification. One problem with these approaches is that the number of N-grams grows exponentially as the order N is increased. This is especially problematic for an SVM classifier as each utterance is represented as a distinct N-gram vector. In this paper we propose a novel approach for modeling higher order Ngrams using an SVM via an alternating filter-wrapper feature selection method. We demonstrate the effectiveness of this technique on the NIST 2007 language recognition task.
READ LESS

Summary

One commonly used approach for language recognition is to convert the input speech into a sequence of tokens such as words or phones and then to use these token sequences to determine the target language. The language classification is typically performed by extracting N-gram statistics from the token sequences and...

READ MORE

Topic identification from audio recordings using word and phone recognition lattices

Published in:
2000 IEEE Workshop on Automatic Speech Recognition and Understanding, 9-13 December 2007, pp. 659-664.

Summary

In this paper, we investigate the problem of topic identification from audio documents using features extracted from speech recognition lattices. We are particularly interested in the difficult case where the training material is minimally annotated with only topic labels. Under this scenario, the lexical knowledge that is useful for topic identification may not be available, and automatic methods for extracting linguistic knowledge useful for distinguishing between topics must be relied upon. Towards this goal we investigate the problem of topic identification on conversational telephone speech from the Fisher corpus under a variety of increasingly difficult constraints. We contrast the performance of systems that have knowledge of the lexical units present in the audio data, against systems that rely entirely on phonetic processing.
READ LESS

Summary

In this paper, we investigate the problem of topic identification from audio documents using features extracted from speech recognition lattices. We are particularly interested in the difficult case where the training material is minimally annotated with only topic labels. Under this scenario, the lexical knowledge that is useful for topic...

READ MORE

Language recognition with word lattices and support vector machines

Published in:
Proc. IEEE Int. Conf. on Acoustics, Speech, and Signal Processing, ICASSP, 15-20 April 2007, Vol. IV, pp. 989-992.

Summary

Language recognition is typically performed with methods that exploit phonotactics--a phone recognition language modeling (PRLM) system. A PRLM system converts speech to a lattice of phones and then scores a language model. A standard extension to this scheme is to use multiple parallel phone recognizers (PPRLM). In this paper, we modify this approach in two distinct ways. First, we replace the phone tokenizer by a powerful speech-to-text system. Second, we use a discriminative support vector machine for language modeling. Our goals are twofold. First, we explore the ability of a single speech-to-text system to distinguish multiple languages. Second, we fuse the new system with an SVM PRLM system to see if it complements current approaches. Experiments on the 2005 NIST language recognition corpus show the new word system accomplishes these goals and has significant potential for language recognition.
READ LESS

Summary

Language recognition is typically performed with methods that exploit phonotactics--a phone recognition language modeling (PRLM) system. A PRLM system converts speech to a lattice of phones and then scores a language model. A standard extension to this scheme is to use multiple parallel phone recognizers (PPRLM). In this paper, we...

READ MORE