MCE training techniques for topic identification of spoken audio documents
November 1, 2011
Journal Article
Author:
Published in:
IEEE Trans. Audio, Speech, Language Proc., Vol. 19, No. 8, November 2011, pp. 2451-2461.
R&D Area:
Summary
In this paper, we discuss the use of minimum classification error (MCE) training as a means for improving traditional approaches to topic identification such as naive Bayes classifiers and support vector machines. A key element of our new MCE training techniques is their ability to efficiently apply jackknifing or leave-one-out training to yield improved models which generalize better to unseen data. Experiments were conducted using recorded human-human telephone conversations from the Fisher Corpus using feature vector representations from word-based automatic speech recognition lattices. Sizeable improvements in topic identification accuracy using the new MCE training techniques were observed.