Publications

Refine Results

(Filters Applied) Clear All

USSS-MITLL 2010 human assisted speaker recognition

Summary

The United States Secret Service (USSS) teamed with MIT Lincoln Laboratory (MIT/LL) in the US National Institute of Standards and Technology's 2010 Speaker Recognition Evaluation of Human Assisted Speaker Recognition (HASR). We describe our qualitative and automatic speaker comparison processes and our fusion of these processes, which are adapted from USSS casework. The USSS-MIT/LL 2010 HASR results are presented. We also present post-evaluation results. The results are encouraging within the resolving power of the evaluation, which was limited to enable reasonable levels of human effort. Future ideas and efforts are discussed, including new features and capitalizing on naive listeners.
READ LESS

Summary

The United States Secret Service (USSS) teamed with MIT Lincoln Laboratory (MIT/LL) in the US National Institute of Standards and Technology's 2010 Speaker Recognition Evaluation of Human Assisted Speaker Recognition (HASR). We describe our qualitative and automatic speaker comparison processes and our fusion of these processes, which are adapted from...

READ MORE

Forensic speaker recognition: a need for caution

Summary

There has long been a desire to be able to identify a person on the basis of his or her voice. For many years, judges, lawyers, detectives, and law enforcement agencies have wanted to use forensic voice authentication to investigate a suspect or to confirm a judgment of guilt or innocence. Challenges, realities, and cautions regarding the use of speaker recognition applied to forensic-quality samples are presented.
READ LESS

Summary

There has long been a desire to be able to identify a person on the basis of his or her voice. For many years, judges, lawyers, detectives, and law enforcement agencies have wanted to use forensic voice authentication to investigate a suspect or to confirm a judgment of guilt or...

READ MORE

Beyond frame independence: parametric modelling of time duration in speaker and language recognition

Published in:
INTERSPEECH 2008, 22-26 September 2008, pp. 767-770.

Summary

In this work, we address the question of generating accurate likelihood estimates from multi-frame observations in speaker and language recognition. Using a simple theoretical model, we extend the basic assumption of independent frames to include two refinements: a local correlation model across neighboring frames, and a global uncertainty due to train/test channel mismatch. We present an algorithm for discriminative training of the resulting duration model based on logistic regression combined with a bisection search. We show that using this model we can achieve state-of-the-art performance for the NIST LRE07 task. Finally, we show that these more accurate class likelihood estimates can be combined to solve multiple problems using Bayes' rule, so that we can expand our single parametric backend to replace all six separate back-ends used in our NIST LRE submission for both closed and open sets.
READ LESS

Summary

In this work, we address the question of generating accurate likelihood estimates from multi-frame observations in speaker and language recognition. Using a simple theoretical model, we extend the basic assumption of independent frames to include two refinements: a local correlation model across neighboring frames, and a global uncertainty due to...

READ MORE

MIT Lincoln Laboratory multimodal person identification system in the CLEAR 2007 Evaluation

Author:
Published in:
2nd Annual Classification of Event Activities and Relationships/Rich Transcription Evaluations, 8-11 May 2008, pp. 240-247.

Summary

A description of the MIT Lincoln Laboratory system used in the person identification task of the recent CLEAR 2007 Evaluation is documented in this paper. This task is broken into audio, visual, and multimodal subtasks. The audio identification system utilizes both a GMM and a SVM subsystem, while the visual (face) identification system utilizes an appearance-based [Kernel] approach for identification. The audio channels, originating from a microphone array, were preprocessed with beamforming and noise preprocessing.
READ LESS

Summary

A description of the MIT Lincoln Laboratory system used in the person identification task of the recent CLEAR 2007 Evaluation is documented in this paper. This task is broken into audio, visual, and multimodal subtasks. The audio identification system utilizes both a GMM and a SVM subsystem, while the visual...

READ MORE