Fusing discriminative and generative methods for speaker recognition: experiments on switchboard and NFI/TNO field data

May 31, 2004

Conference Paper

Author:

William M. Campbell

…

Published in:

ODYSSEY 2004, Speaker and Language Recognition Workshop, 31 May - 3 June 2004.

R&D Area:

Cyber Security and Information Sciences

R&D Group:

Artificial Intelligence Technology and Systems

Fusing discriminative and generative methods for speaker recognition: experiments on switchboard and NFI/TNO field data

Summary

Discriminatively trained support vector machines have recently been introduced as a novel approach to speaker recognition. Support vector machines (SVMs) have a distinctly different modeling strategy in the speaker recognition problem. The standard Gaussian mixture model (GMM) approach focuses on modeling the probability density of the speaker and the background (a generative approach). In contrast, the SVM models the boundary between the classes. Another interesting aspect of the SVM is that it does not directly produce probabilistic scores. This poses a challenge for combining results with a GMM. We therefore propose strategies for fusing the two approaches. We show that the SVM and GMM are complementary technologies. Recent evaluations by NIST (telephone data) and NFI/TNO (forensic data) give a unique opportunity to test the robustness and viability of fusing GMM and SVM methods. We show that fusion produces a system which can have relative error rates 23% lower than individual systems.

Tagged As