Using deep belief networks for vector-based speaker recognition

September 14, 2014

Conference Paper

Author:

William M. Campbell

Published in:

INTERSPEECH 2014: 15th Annual Conf. of the Int. Speech Communication Assoc., 14-18 September 2014.

R&D Area:

Cyber Security and Information Sciences

R&D Group:

Artificial Intelligence Technology and Systems

Using deep belief networks for vector-based speaker recognition

Summary

Deep belief networks (DBNs) have become a successful approach for acoustic modeling in speech recognition. DBNs exhibit strong approximation properties, improved performance, and are parameter efficient. In this work, we propose methods for applying DBNs to speaker recognition. In contrast to prior work, our approach to DBNs for speaker recognition starts at the acoustic modeling layer. We use sparse-output DBNs trained with both unsupervised and supervised methods to generate statistics for use in standard vector-based speaker recognition methods. We show that a DBN can replace a GMM UBM in this processing. Methods, qualitative analysis, and results are given on a NIST SRE 2012 task. Overall, our results show that DBNs show competitive performance to modern approaches in an initial implementation of our framework.