A unified deep neural network for speaker and language recognition
September 6, 2015
Significant performance gains have been reported separately for speaker recognition (SR) and language recognition (LR) tasks using either DNN posteriors of sub-phonetic units or DNN feature representations, but the two techniques have not been compared on the same SR or LR task or across SR and LR tasks using the same DNN. In this work we present the application of a single DNN for both tasks using the 2013 Domain Adaptation Challenge speaker recognition (DAC13) and the NIST 2011 language recognition evaluation (LRE11) benchmarks. Using a single DNN trained on Switchboard data we demonstrate large gains in performance on both benchmarks: a 55% reduction in EER for the DAC13 out-of-domain condition and a 48% reduction in Cavg on the LRE11 30s test condition. Score fusion and feature fusion are also investigated as is the performance of the DNN technologies at short durations for SR.