Advances in cross-lingual and cross-source audio-visual speaker recognition: The JHU-MIT system for NIST SRE21
We present a condensed description of the joint effort of JHUCLSP/HLTCOE, MIT-LL and AGH for NIST SRE21. NIST SRE21 consisted of speaker detection over multilingual conversational telephone speech (CTS) and audio from video (AfV). Besides the regular audio track, the evaluation also contains visual (face recognition) and multi-modal tracks. This...