Topic identification from audio recordings using word and phone recognition lattices

December 9, 2007

Conference Paper

Author:

Timothy J. Hazen

…

Published in:

2000 IEEE Workshop on Automatic Speech Recognition and Understanding, 9-13 December 2007, pp. 659-664.

R&D Area:

Cyber Security and Information Sciences

R&D Group:

Artificial Intelligence Technology and Systems

Topic identification from audio recordings using word and phone recognition lattices

Summary

In this paper, we investigate the problem of topic identification from audio documents using features extracted from speech recognition lattices. We are particularly interested in the difficult case where the training material is minimally annotated with only topic labels. Under this scenario, the lexical knowledge that is useful for topic identification may not be available, and automatic methods for extracting linguistic knowledge useful for distinguishing between topics must be relied upon. Towards this goal we investigate the problem of topic identification on conversational telephone speech from the Fisher corpus under a variety of increasingly difficult constraints. We contrast the performance of systems that have knowledge of the lexical units present in the audio data, against systems that rely entirely on phonetic processing.

Tagged As