Bayesian estimation of PLDA with noisy training labels, with applications to speaker verification
May 4, 2020
2020 IEEE Intl. Conf. on Acoustics, Speech and Signal Processing, ICASSP, 4-8 May 2020.
This paper proposes a method for Bayesian estimation of probabilistic linear discriminant analysis (PLDA) when training labels are noisy. Label errors can be expected during e.g. large or distributed data collections, or for crowd-sourced data labeling. By interpreting true labels as latent random variables, the observed labels are modeled as outputs of a discrete memoryless channel, and the maximum a posteriori (MAP) estimate of the PLDA model is derived via Variational Bayes. The proposed framework can be used for PLDA estimation, PLDA domain adaptation, or to infer the reliability of a PLDA training list. Although presented as a general method, the paper discusses specific applications for speaker verification. When applied to the Speakers in the Wild (SITW) Task, the proposed method achieves graceful performance degradation when label errors are introduced into the training or domain adaptation lists. When applied to the NIST 2018 Speaker Recognition Evaluation (SRE18) Task, which includes adaptation data with noisy speaker labels, the proposed technique provides performance improvements relative to unsupervised domain adaptation.