Publications
The JHU-MIT System Description for NIST SRE19 AV
Summary
Summary
This document represents the SRE19 AV submission by the team composed of JHU-CLSP, JHU-HLTCOE and MIT Lincoln Labs. All the developed systems for the audio and videoconditions consisted of Neural network embeddings with some flavor of PLDA/cosine back-end. Primary fusions obtained Actual DCF of 0.250 on SRE18 VAST eval, 0.183...
Corpora design and score calibration for text dependent pronunciation proficiency recognition
Summary
Summary
This work investigates methods for improving a pronunciation proficiency recognition system, both in terms of phonetic level posterior probability calibration, and in ordinal utterance level classification, for Modern Standard Arabic (MSA), Spanish and Russian. To support this work, utterance level labels were obtained by crowd-sourcing the annotation of language learners'...
State-of-the-art speaker recognition for telephone and video speech: the JHU-MIT submission for NIST SRE18
Summary
Summary
We present a condensed description of the joint effort of JHUCLSP, JHU-HLTCOE, MIT-LL., MIT CSAIL and LSE-EPITA for NIST SRE18. All the developed systems consisted of xvector/i-vector embeddings with some flavor of PLDA backend. Very deep x-vector architectures–Extended and Factorized TDNN, and ResNets– clearly outperformed shallower xvectors and i-vectors. The...
Artificial intelligence: short history, present developments, and future outlook, final report
Summary
Summary
The Director's Office at MIT Lincoln Laboratory (MIT LL) requested a comprehensive study on artificial intelligence (AI) focusing on present applications and future science and technology (S&T) opportunities in the Cyber Security and Information Sciences Division (Division 5). This report elaborates on the main results from the study. Since the...
Multi-lingual deep neural networks for language recognition
Summary
Summary
Multi-lingual feature extraction using bottleneck layers in deep neural networks (BN-DNNs) has been proven to be an effective technique for low resource speech recognition and more recently for language recognition. In this work we investigate the impact on language recognition performance of the multi-lingual BN-DNN architecture and training configurations for...
Speaker recognition using real vs synthetic parallel data for DNN channel compensation
Summary
Summary
Recent work has shown large performance gains using denoising DNNs for speech processing tasks under challenging acoustic conditions. However, training these DNNs requires large amounts of parallel multichannel speech data which can be impractical or expensive to collect. The effective use of synthetic parallel data as an alternative has been...
The MITLL NIST LRE 2015 Language Recognition System
Summary
Summary
In this paper we describe the most recent MIT Lincoln Laboratory language recognition system developed for the NIST 2015 Language Recognition Evaluation (LRE). The submission features a fusion of five core classifiers, with most systems developed in the context of an i-vector framework. The 2015 evaluation presented new paradigms. First...
Channel compensation for speaker recognition using MAP adapted PLDA and denoising DNNs
Summary
Summary
Over several decades, speaker recognition performance has steadily improved for applications using telephone speech. A big part of this improvement has been the availability of large quantities of speaker-labeled data from telephone recordings. For new data applications, such as audio from room microphones, we would like to effectively use existing...
A unified deep neural network for speaker and language recognition
Summary
Summary
Significant performance gains have been reported separately for speaker recognition (SR) and language recognition (LR) tasks using either DNN posteriors of sub-phonetic units or DNN feature representations, but the two techniques have not been compared on the same SR or LR task or across SR and LR tasks using the...
Deep neural network approaches to speaker and language recognition
Summary
Summary
The impressive gains in performance obtained using deep neural networks (DNNs) for automatic speech recognition (ASR) have motivated the application of DNNs to other speech technologies such as speaker recognition (SR) and language recognition (LR). Prior work has shown performance gains for separate SR and LR tasks using DNNs for...
