Publications

Refine Results

(Filters Applied) Clear All

Investigation of the relationship of vocal, eye-tracking, and fMRI ROI time-series measures with preclinical mild traumatic brain injury*

Summary

In this work, we are examining correlations between vocal articulatory features, ocular smooth pursuit measures, and features from the fMRI BOLD response in regions of interest (ROI) time series in a high school athlete population susceptible to repeated head impact within a sports season. Initial results have indicated relationships between vocal features and brain ROIs that may show which components of the neural speech networks effected are effected by preclinical mild traumatic brain injury (mTBI). The data used for this study was collected by Purdue University on 32 high school athletes over the entirety of a sports season (Helfer, et al., 2014), and includes fMRI measurements made pre-season, in-season, and postseason. The athletes are 25 male football players and 7 female soccer players. The Immediate Post-Concussion Assessment and Cognitive Testing suite (ImPACT) was used as a means of assessing cognitive performance (Broglio, Ferrara, Macciocchi, Baumgartner, & Elliott, 2007). The test is made up of six sections, which measure verbal memory, visual memory, visual motor speed, reaction time, impulse control, and a total symptom composite. Using each test, a threshold is set for a change in cognitive performance. The threshold for each test is defined as a decline from baseline that exceeds one standard deviation, where the standard deviation is computed over the change from baseline across all subjects’ test scores. Speech features were extracted from audio recordings of the Grandfather Passage, which provides a standardized and phonetically balanced sample of speech. Oculomotor testing included two experimental conditions. In the smooth pursuit condition, a single target moving circularly, at constant speed. In the saccade condition, a target was jumped between one of three location along the horizontal midline of the screen. In both trial types, subjects visually tracked the targets during the trials, which lasted for one minute. The fMRI features are derived from the bold time-series data from resting state fMRI scans of the subjects. The pre-processing of the resting state fMRI and accompanying structural MRI data (for Atlas registration) was performed with the toolkit CONN (Whitfield-Gabrieli & Nieto-Castanon, 2012). Functional connectivity was generated using cortical and sub-cortical atlas registrations. This investigation will explores correlations between these three modalities and a cognitive performance assessment.
READ LESS

Summary

In this work, we are examining correlations between vocal articulatory features, ocular smooth pursuit measures, and features from the fMRI BOLD response in regions of interest (ROI) time series in a high school athlete population susceptible to repeated head impact within a sports season. Initial results have indicated relationships between...

READ MORE

The MIT Lincoln Laboratory/JHU/EPITA-LSE LRE17 System

Summary

Competitive international language recognition evaluations have been hosted by NIST for over two decades. This paper describes the MIT Lincoln Laboratory (MITLL) and Johns Hopkins University (JHU) submission for the recent 2017 NIST language recognition evaluation (LRE17) [1]. The MITLL/JHU LRE17 submission represents a collaboration between researchers at MITLL and JHU with multiple sub-systems reflecting a range of language recognition technologies including traditional MFCC/SDC i-vector systems, deep neural network (DNN) bottleneck feature based i-vector systems, state-of-the-art DNN x-vector systems and a sparse coding system. Each sub-systems uses the same backend processing for domain adaptation and score calibration. Multiple sub-systems were fused using a simple logistic regression ([2]) to create system combinations. The MITLL/JHU submissions were selected based on the top ranking combinations of up to 5 sub-systems using development data provided by NIST. The MITLL/JHU primary submitted systems attained a Cavg of 0.181 and 0.163 for the fixed and open conditions respectively. Post evaluation analysis revealed the importance of carefully partitioning for the development data, using augmented training data and using a condition dependent backend. Addressing these issues - including retraining the x-vector system with augmented data - yielded gains in performance of over 17%: a Cavg of 0.149 for the fixed condition and 0.132 for the open condition.
READ LESS

Summary

Competitive international language recognition evaluations have been hosted by NIST for over two decades. This paper describes the MIT Lincoln Laboratory (MITLL) and Johns Hopkins University (JHU) submission for the recent 2017 NIST language recognition evaluation (LRE17) [1]. The MITLL/JHU LRE17 submission represents a collaboration between researchers at MITLL and...

READ MORE

Corpora for the evaluation of robust speaker recognition systems

Published in:
INTERSPEECH 2016: 16th Annual Conf. of the Int. Speech Communication Assoc., 8-12 September 2016.

Summary

The goal of this paper is to describe significant corpora available to support speaker recognition research and evaluation, along with details about the corpora collection and design. We describe the attributes of high-quality speaker recognition corpora. Considerations of the application, domain, and performance metrics are also discussed. Additionally, a literature survey of corpora used in speaker recognition research over the last 10 years is presented. Finally we show the most common corpora used in the research community and review them on their success in enabling meaningful speaker recognition research.
READ LESS

Summary

The goal of this paper is to describe significant corpora available to support speaker recognition research and evaluation, along with details about the corpora collection and design. We describe the attributes of high-quality speaker recognition corpora. Considerations of the application, domain, and performance metrics are also discussed. Additionally, a literature...

READ MORE

Speaker linking and applications using non-parametric hashing methods

Published in:
INTERSPEECH 2016: 16th Annual Conf. of the Int. Speech Communication Assoc., 8-12 September 2016.

Summary

Large unstructured audio data sets have become ubiquitous and present a challenge for organization and search. One logical approach for structuring data is to find common speakers and link occurrences across different recordings. Prior approaches to this problem have focused on basic methodology for the linking task. In this paper, we introduce a novel trainable nonparametric hashing method for indexing large speaker recording data sets. This approach leads to tunable computational complexity methods for speaker linking. We focus on a scalable clustering method based on hashing canopy-clustering. We apply this method to a large corpus of speaker recordings, demonstrate performance tradeoffs, and compare to other hashing methods.
READ LESS

Summary

Large unstructured audio data sets have become ubiquitous and present a challenge for organization and search. One logical approach for structuring data is to find common speakers and link occurrences across different recordings. Prior approaches to this problem have focused on basic methodology for the linking task. In this paper...

READ MORE

Language recognition via sparse coding

Published in:
INTERSPEECH 2016: 16th Annual Conf. of the Int. Speech Communication Assoc., 8-12 September 2016.

Summary

Spoken language recognition requires a series of signal processing steps and learning algorithms to model distinguishing characteristics of different languages. In this paper, we present a sparse discriminative feature learning framework for language recognition. We use sparse coding, an unsupervised method, to compute efficient representations for spectral features from a speech utterance while learning basis vectors for language models. Differentiated from existing approaches in sparse representation classification, we introduce a maximum a posteriori (MAP) adaptation scheme based on online learning that further optimizes the discriminative quality of sparse-coded speech features. We empirically validate the effectiveness of our approach using the NIST LRE 2015 dataset.
READ LESS

Summary

Spoken language recognition requires a series of signal processing steps and learning algorithms to model distinguishing characteristics of different languages. In this paper, we present a sparse discriminative feature learning framework for language recognition. We use sparse coding, an unsupervised method, to compute efficient representations for spectral features from a...

READ MORE

The MITLL NIST LRE 2015 Language Recognition System

Summary

In this paper we describe the most recent MIT Lincoln Laboratory language recognition system developed for the NIST 2015 Language Recognition Evaluation (LRE). The submission features a fusion of five core classifiers, with most systems developed in the context of an i-vector framework. The 2015 evaluation presented new paradigms. First, the evaluation included fixed training and open training tracks for the first time; second, language classification performance was measured across 6 language clusters using 20 language classes instead of an N-way language task; and third, performance was measured across a nominal 3-30 second range. Results are presented for the overall performance across the six language clusters for both the fixed and open training tasks. On the 6-cluster metric the Lincoln system achieved overall costs of 0.173 and 0.168 for the fixed and open tasks respectively.
READ LESS

Summary

In this paper we describe the most recent MIT Lincoln Laboratory language recognition system developed for the NIST 2015 Language Recognition Evaluation (LRE). The submission features a fusion of five core classifiers, with most systems developed in the context of an i-vector framework. The 2015 evaluation presented new paradigms. First...

READ MORE

Multimodal sparse coding for event detection

Published in:
Neural Information Processing Multimodal Machine Learning Workshop, NIPS 2015, 7-12 December 2015.

Summary

Unsupervised feature learning methods have proven effective for classification tasks based on a single modality. We present multimodal sparse coding for learning feature representations shared across multiple modalities. The shared representations are applied to multimedia event detection (MED) and evaluated in comparison to unimodal counterparts, as well as other feature learning methods such as GMM supervectors and sparse RBM. We report the cross-validated classification accuracy and mean average precision of the MED system trained on features learned from our unimodal and multimodal settings for a subset of the TRECVID MED 2014 dataset.
READ LESS

Summary

Unsupervised feature learning methods have proven effective for classification tasks based on a single modality. We present multimodal sparse coding for learning feature representations shared across multiple modalities. The shared representations are applied to multimedia event detection (MED) and evaluated in comparison to unimodal counterparts, as well as other feature...

READ MORE

Exploring the impact of advanced front-end processing on NIST speaker recognition microphone tasks

Summary

The NIST speaker recognition evaluation (SRE) featured microphone data in the 2005-2010 evaluations. The preprocessing and use of this data has typically been performed with telephone bandwidth and quantization. Although this approach is viable, it ignores the richer properties of the microphone data-multiple channels, high-rate sampling, linear encoding, ambient noise properties, etc. In this paper, we explore alternate choices of preprocessing and examine their effects on speaker recognition performance. Specifically, we consider the effects of quantization, sampling rate, enhancement, and two-channel speech activity detection. Experiments on the NIST 2010 SRE interview microphone corpus demonstrate that performance can be dramatically improved with a different preprocessing chain.
READ LESS

Summary

The NIST speaker recognition evaluation (SRE) featured microphone data in the 2005-2010 evaluations. The preprocessing and use of this data has typically been performed with telephone bandwidth and quantization. Although this approach is viable, it ignores the richer properties of the microphone data-multiple channels, high-rate sampling, linear encoding, ambient noise...

READ MORE

The MITLL NIST LRE 2011 language recognition system

Summary

This paper presents a description of the MIT Lincoln Laboratory (MITLL) language recognition system developed for the NIST 2011 Language Recognition Evaluation (LRE). The submitted system consisted of a fusion of four core classifiers, three based on spectral similarity and one based on tokenization. Additional system improvements were achieved following the submission deadline. In a major departure from previous evaluations, the 2011 LRE task focused on closed-set pairwise performance so as to emphasize a system's ability to distinguish confusable language pairs. Results are presented for the 24-language confusable pair task at test utterance durations of 30, 10, and 3 seconds. Results are also shown using the standard detection metrics (DET, minDCF) and it is demonstrated the previous metrics adequately cover difficult pair performance. On the 30 s 24-language confusable pair task, the submitted and post-evaluation systems achieved average costs of 0.079 and 0.070 and standard detection costs of 0.038 and 0.033.
READ LESS

Summary

This paper presents a description of the MIT Lincoln Laboratory (MITLL) language recognition system developed for the NIST 2011 Language Recognition Evaluation (LRE). The submitted system consisted of a fusion of four core classifiers, three based on spectral similarity and one based on tokenization. Additional system improvements were achieved following...

READ MORE

A new perspective on GMM subspace compensation based on PPCA and Wiener filtering

Published in:
2011 INTERSPEECH, 27-31 August 2011, pp. 145-148.

Summary

We present a new perspective on the subspace compensation techniques that currently dominate the field of speaker recognition using Gaussian Mixture Models (GMMs). Rather than the traditional factor analysis approach, we use Gaussian modeling in the sufficient statistic supervector space combined with Probabilistic Principal Component Analysis (PPCA) within-class and shared across class covariance matrices to derive a family of training and testing algorithms. Key to this analysis is the use of two noise terms for each speech cut: a random channel offset and a length dependent observation noise. Using the Wiener filtering perspective, formulas for optimal train and test algorithms for Joint Factor Analysis (JFA) are simple to derive. In addition, we can show that an alternative form of Wiener filtering results in the i-vector approach, thus tying together these two disparate techniques.
READ LESS

Summary

We present a new perspective on the subspace compensation techniques that currently dominate the field of speaker recognition using Gaussian Mixture Models (GMMs). Rather than the traditional factor analysis approach, we use Gaussian modeling in the sufficient statistic supervector space combined with Probabilistic Principal Component Analysis (PPCA) within-class and shared...

READ MORE