MCE training techniques for topic identification of spoken audio documents

November 1, 2011

Journal Article

Author:

Timothy J. Hazen

Published in:

IEEE Trans. Audio, Speech, Language Proc., Vol. 19, No. 8, November 2011, pp. 2451-2461.

R&D Area:

Cyber Security and Information Sciences

R&D Group:

Artificial Intelligence Technology and Systems

MCE training techniques for topic identification of spoken audio documents

Summary

In this paper, we discuss the use of minimum classification error (MCE) training as a means for improving traditional approaches to topic identification such as naive Bayes classifiers and support vector machines. A key element of our new MCE training techniques is their ability to efficiently apply jackknifing or leave-one-out training to yield improved models which generalize better to unseen data. Experiments were conducted using recorded human-human telephone conversations from the Fisher Corpus using feature vector representations from word-based automatic speech recognition lattices. Sizeable improvements in topic identification accuracy using the new MCE training techniques were observed.

Tagged As