Summary
In speaker verification, score calibration is employed to transform verification scores to log-likelihood ratios (LLRs) which are statistically interpretable. Conventional calibration techniques apply a global score transform. However, in condition-aware (CA) calibration, information conveying signal conditions is provided as input, allowing calibration to be adaptive. This paper explores a generative approach to condition-aware score calibration. It proposes a novel generative model for speaker verification trials, each which includes a trial score, a trial label, and the associated pair of speaker embeddings. Trials are assumed to be drawn from a discrete set of underlying signal conditions which are modeled as latent Categorical random variables, so that trial scores and speaker embeddings are drawn from condition-dependent distributions. An Expectation-Maximization (EM) Algorithm for parameter estimation of the proposed model is presented, which does not require condition labels and instead discovers relevant conditions in an unsupervised manner. The generative condition-aware (GCA) calibration transform is then derived as the log-likelihood ratio of a verification score given the observed pair of embeddings. Experimental results show the proposed approach to provide performance improvements on a variety of speaker verification tasks, outperforming static and condition-aware baseline calibration methods. GCA calibration is observed to improve the discriminative ability of the speaker verification system, as well as provide good calibration performance across a range of operating points. The benefits of the proposed method are observed for task-dependent models where signal conditions are known, for universal models which are robust across a range of conditions, and when facing unseen signal conditions.