Speaker diarisation for broadcast news

May 31, 2004

Conference Paper

Author:

Sue E. Tranter

…

Douglas A. Reynolds

Published in:

Odyssey 2004, 31 May - 4 June 2004.

R&D Area:

Cyber Security and Information Sciences

R&D Group:

Artificial Intelligence Technology and Systems

Speaker diarisation for broadcast news

Summary

It is often important to be able to automatically label 'who spoke when' during some audio data. This paper describes two systems for audio segmentation developed at CUED and MIT-LL and evaluates their performance using the speaker diarisation score defined in the 2003 Rich Transcription Evaluation. A new clustering procedure and BIC-based stopping criterion for the CUED system is introduced which improves both performance and robustness to changes in segmentation. Finally a hybrid 'Plug and Play' system is built which combines different parts of the CUED and MIT-LL systems to produce a single system which outperforms both the individual systems.