Blind clustering of speech utterances based on speaker and language characteristics

November 30, 1998

Conference Paper

Author:

Douglas A. Reynolds

…

Published in:

5th Int. Conf. Spoken Language Processing (ICSLP), 30 November - 4 December 1998.

R&D Area:

Cyber Security and Information Sciences

R&D Group:

Artificial Intelligence Technology and Systems

Blind clustering of speech utterances based on speaker and language characteristics

Summary

Classical speaker and language recognition techniques can be applied to the classification of unknown utterances by computing the likelihoods of the utterances given a set of well trained target models. This paper addresses the problem of grouping unknown utterances when no information is available regarding the speaker or language classes or even the total number of classes. Approaches to blind message clustering are presented based on conventional hierarchical clustering techniques and an integrated cluster generation and selection method called the d* algorithm. Results are presented using message sets derived from the Switchboard and Callfriend corpora. Potential applications include automatic indexing of recorded speech corpora by speaker/language tags and automatic or semiautomatic selection of speaker specific speech utterances for speaker recognition adaptation.