Publications

Refine Results

(Filters Applied) Clear All

The MIT Lincoln Laboratory/JHU/EPITA-LSE LRE17 System

Summary

Competitive international language recognition evaluations have been hosted by NIST for over two decades. This paper describes the MIT Lincoln Laboratory (MITLL) and Johns Hopkins University (JHU) submission for the recent 2017 NIST language recognition evaluation (LRE17) [1]. The MITLL/JHU LRE17 submission represents a collaboration between researchers at MITLL and JHU with multiple sub-systems reflecting a range of language recognition technologies including traditional MFCC/SDC i-vector systems, deep neural network (DNN) bottleneck feature based i-vector systems, state-of-the-art DNN x-vector systems and a sparse coding system. Each sub-systems uses the same backend processing for domain adaptation and score calibration. Multiple sub-systems were fused using a simple logistic regression ([2]) to create system combinations. The MITLL/JHU submissions were selected based on the top ranking combinations of up to 5 sub-systems using development data provided by NIST. The MITLL/JHU primary submitted systems attained a Cavg of 0.181 and 0.163 for the fixed and open conditions respectively. Post evaluation analysis revealed the importance of carefully partitioning for the development data, using augmented training data and using a condition dependent backend. Addressing these issues - including retraining the x-vector system with augmented data - yielded gains in performance of over 17%: a Cavg of 0.149 for the fixed condition and 0.132 for the open condition.
READ LESS

Summary

Competitive international language recognition evaluations have been hosted by NIST for over two decades. This paper describes the MIT Lincoln Laboratory (MITLL) and Johns Hopkins University (JHU) submission for the recent 2017 NIST language recognition evaluation (LRE17) [1]. The MITLL/JHU LRE17 submission represents a collaboration between researchers at MITLL and...

READ MORE

Multi-modal audio, video and physiological sensor learning for continuous emotion prediction

Summary

The automatic determination of emotional state from multimedia content is an inherently challenging problem with a broad range of applications including biomedical diagnostics, multimedia retrieval, and human computer interfaces. The Audio Video Emotion Challenge (AVEC) 2016 provides a well-defined framework for developing and rigorously evaluating innovative approaches for estimating the arousal and valence states of emotion as a function of time. It presents the opportunity for investigating multimodal solutions that include audio, video, and physiological sensor signals. This paper provides an overview of our AVEC Emotion Challenge system, which uses multi-feature learning and fusion across all available modalities. It includes a number of technical contributions, including the development of novel high- and low-level features for modeling emotion in the audio, video, and physiological channels. Low-level features include modeling arousal in audio with minimal prosodic-based descriptors. High-level features are derived from supervised and unsupervised machine learning approaches based on sparse coding and deep learning. Finally, a state space estimation approach is applied for score fusion that demonstrates the importance of exploiting the time-series nature of the arousal and valence states. The resulting system outperforms the baseline systems [10] on the test evaluation set with an achieved Concordant Correlation Coefficient (CCC) for arousal of 0.770 vs 0.702 (baseline) and for valence of 0.687 vs 0.638. Future work will focus on exploiting the time-varying nature of individual channels in the multi-modal framework.
READ LESS

Summary

The automatic determination of emotional state from multimedia content is an inherently challenging problem with a broad range of applications including biomedical diagnostics, multimedia retrieval, and human computer interfaces. The Audio Video Emotion Challenge (AVEC) 2016 provides a well-defined framework for developing and rigorously evaluating innovative approaches for estimating the...

READ MORE

Detecting depression using vocal, facial and semantic communication cues

Summary

Major depressive disorder (MDD) is known to result in neurophysiological and neurocognitive changes that affect control of motor, linguistic, and cognitive functions. MDD's impact on these processes is reflected in an individual's communication via coupled mechanisms: vocal articulation, facial gesturing and choice of content to convey in a dialogue. In particular, MDD-induced neurophysiological changes are associated with a decline in dynamics and coordination of speech and facial motor control, while neurocognitive changes influence dialogue semantics. In this paper, biomarkers are derived from all of these modalities, drawing first from previously developed neurophysiologically motivated speech and facial coordination and timing features. In addition, a novel indicator of lower vocal tract constriction in articulation is incorporated that relates to vocal projection. Semantic features are analyzed for subject/avatar dialogue content using a sparse coded lexical embedding space, and for contextual clues related to the subject's present or past depression status. The features and depression classification system were developed for the 6th International Audio/Video Emotion Challenge (AVEC), which provides data consisting of audio, video-based facial action units, and transcribed text of individuals communicating with the human-controlled avatar. A clinical Patient Health Questionnaire (PHQ) score and binary depression decision are provided for each participant. PHQ predictions were obtained by fusing outputs from a Gaussian staircase regressor for each feature set, with results on the development set of mean F1=0.81, RMSE=5.31, and MAE=3.34. These compare favorably to the challenge baseline development results of mean F1=0.73, RMSE=6.62, and MAE=5.52. On test set evaluation, our system obtained a mean F1=0.70, which is similar to the challenge baseline test result. Future work calls for consideration of joint feature analyses across modalities in an effort to detect neurological disorders based on the interplay of motor, linguistic, affective, and cognitive components of communication.
READ LESS

Summary

Major depressive disorder (MDD) is known to result in neurophysiological and neurocognitive changes that affect control of motor, linguistic, and cognitive functions. MDD's impact on these processes is reflected in an individual's communication via coupled mechanisms: vocal articulation, facial gesturing and choice of content to convey in a dialogue. In...

READ MORE

Sparse-coded net model and applications

Summary

As an unsupervised learning method, sparse coding can discover high-level representations for an input in a large variety of learning problems. Under semi-supervised settings, sparse coding is used to extract features for a supervised task such as classification. While sparse representations learned from unlabeled data independently of the supervised task perform well, we argue that sparse coding should also be built as a holistic learning unit optimizing on the supervised task objectives more explicitly. In this paper, we propose sparse-coded net, a feedforward model that integrates sparse coding and task-driven output layers, and describe training methods in detail. After pretraining a sparse-coded net via semi-supervised learning, we optimize its task-specific performance in a novel backpropagation algorithm that can traverse nonlinear feature pooling operators to update the dictionary. Thus, sparse-coded net can be applied to supervised dictionary learning. We evaluate sparse-coded net with classification problems in sound, image, and text data. The results confirm a significant improvement over semi-supervised learning as well as superior classification performance against deep stacked autoencoder neural network and GMM-SVM pipelines in small to medium-scale settings.
READ LESS

Summary

As an unsupervised learning method, sparse coding can discover high-level representations for an input in a large variety of learning problems. Under semi-supervised settings, sparse coding is used to extract features for a supervised task such as classification. While sparse representations learned from unlabeled data independently of the supervised task...

READ MORE

Language recognition via sparse coding

Published in:
INTERSPEECH 2016: 16th Annual Conf. of the Int. Speech Communication Assoc., 8-12 September 2016.

Summary

Spoken language recognition requires a series of signal processing steps and learning algorithms to model distinguishing characteristics of different languages. In this paper, we present a sparse discriminative feature learning framework for language recognition. We use sparse coding, an unsupervised method, to compute efficient representations for spectral features from a speech utterance while learning basis vectors for language models. Differentiated from existing approaches in sparse representation classification, we introduce a maximum a posteriori (MAP) adaptation scheme based on online learning that further optimizes the discriminative quality of sparse-coded speech features. We empirically validate the effectiveness of our approach using the NIST LRE 2015 dataset.
READ LESS

Summary

Spoken language recognition requires a series of signal processing steps and learning algorithms to model distinguishing characteristics of different languages. In this paper, we present a sparse discriminative feature learning framework for language recognition. We use sparse coding, an unsupervised method, to compute efficient representations for spectral features from a...

READ MORE

A reverse approach to named entity extraction and linking in microposts

Published in:
Proc. of the 6th Workshop on "Making Sense of Microposts" (part of: 25th Int. World Wide Web Conf., 11 April 2016), #Microposts2016, pp. 67-69.

Summary

In this paper, we present a pipeline for named entity extraction and linking that is designed specifically for noisy, grammatically inconsistent domains where traditional named entity techniques perform poorly. Our approach leverages a large knowledge base to improve entity recognition, while maintaining the use of traditional NER to identify mentions that are not co-referent with any entities in the knowledge base.
READ LESS

Summary

In this paper, we present a pipeline for named entity extraction and linking that is designed specifically for noisy, grammatically inconsistent domains where traditional named entity techniques perform poorly. Our approach leverages a large knowledge base to improve entity recognition, while maintaining the use of traditional NER to identify mentions...

READ MORE

Blind signal classification via sparse coding

Published in:
IEEE Int. Conf. Computer Communications, IEEE INFOCOM 2016, 10-15 April 2016.

Summary

We propose a novel RF signal classification method based on sparse coding, an unsupervised learning method popular in computer vision. In particular, we employ a convolutional sparse coder that can extract high-level features by computing the maximal similarity between an unknown received signal against an overcomplete dictionary of matched filter templates. Such dictionary can be either generated or trained in an unsupervised fashion from signal examples labeled with no ground truths. The computed sparse code then is applied to train SVM classifiers to discriminate RF signals. As a result, the proposed approach can achieve blind signal classification that requires no prior knowledge (e.g., MCS, pulse shaping) about the signals present in an arbitrary RF channel. Since modulated RF signals undergo pulse shaping to aid the matched filter detection by a receiver for the same radio protocol, our method can exploit variability in relative similarity against the dictionary atoms as the key discriminating factor for SVM. We present an empirical validation of our approach. The results indicate that we can separate different classes of digitally modulated signals from blind sampling with 70.3% recall and 24.6% false alarm at 10 dB SNR. If a labeled dataset were available for supervised classifier training, we can enhance the classification accuracy to 87.8% recall and 14.1% false alarm.
READ LESS

Summary

We propose a novel RF signal classification method based on sparse coding, an unsupervised learning method popular in computer vision. In particular, we employ a convolutional sparse coder that can extract high-level features by computing the maximal similarity between an unknown received signal against an overcomplete dictionary of matched filter...

READ MORE

Competing cognitive resilient networks

Published in:
IEEE Trans. Cognit. Commun. and Netw., Vol. 2, No. 1, March 2016, pp. 95-109.

Summary

We introduce competing cognitive resilient network (CCRN) of mobile radios challenged to optimize data throughput and networking efficiency under dynamic spectrum access and adversarial threats (e.g., jamming). Unlike the conventional approaches, CCRN features both communicator and jamming nodes in a friendly coalition to take joint actions against hostile networking entities. In particular, this paper showcases hypothetical blue force and red force CCRNs and their competition for open spectrum resources. We present state-agnostic and stateful solution approaches based on the decision theoretic framework. The state-agnostic approach builds on multiarmed bandit to develop an optimal strategy that enables the exploratory-exploitative actions from sequential sampling of channel rewards. The stateful approach makes an explicit model of states and actions from an underlying Markov decision process and uses multiagent Q-learning to compute optimal node actions. We provide a theoretical framework for CCRN and propose new algorithms for both approaches. Simulation results indicate that the proposed algorithms outperform some of the most important algorithms known to date.
READ LESS

Summary

We introduce competing cognitive resilient network (CCRN) of mobile radios challenged to optimize data throughput and networking efficiency under dynamic spectrum access and adversarial threats (e.g., jamming). Unlike the conventional approaches, CCRN features both communicator and jamming nodes in a friendly coalition to take joint actions against hostile networking entities...

READ MORE

Multimodal sparse coding for event detection

Published in:
Neural Information Processing Multimodal Machine Learning Workshop, NIPS 2015, 7-12 December 2015.

Summary

Unsupervised feature learning methods have proven effective for classification tasks based on a single modality. We present multimodal sparse coding for learning feature representations shared across multiple modalities. The shared representations are applied to multimedia event detection (MED) and evaluated in comparison to unimodal counterparts, as well as other feature learning methods such as GMM supervectors and sparse RBM. We report the cross-validated classification accuracy and mean average precision of the MED system trained on features learned from our unimodal and multimodal settings for a subset of the TRECVID MED 2014 dataset.
READ LESS

Summary

Unsupervised feature learning methods have proven effective for classification tasks based on a single modality. We present multimodal sparse coding for learning feature representations shared across multiple modalities. The shared representations are applied to multimedia event detection (MED) and evaluated in comparison to unimodal counterparts, as well as other feature...

READ MORE

Fast online learning of antijamming and jamming strategies

Published in:
2015 IEEE Global Communications Conf., 6-10 December 2015.

Summary

Competing Cognitive Radio Network (CCRN) coalesces communicator (comm) nodes and jammers to achieve maximal networking efficiency in the presence of adversarial threats. We have previously developed two contrasting approaches for CCRN based on multi-armed bandit (MAB) and Qlearning. Despite their differences, both approaches have shown to achieve optimal throughput performance. This paper addresses a harder class of problems where channel rewards are time-varying such that learning based on stochastic assumptions cannot guarantee the optimal performance. This new problem is important because an intelligent adversary will likely introduce dynamic changepoints, which can make our previous approaches ineffective. We propose a new, faster learning algorithm using online convex programming that is computationally simpler and stateless. According to our empirical results, the new algorithm can almost instantly find an optimal strategy that achieves the best steady-state channel rewards.
READ LESS

Summary

Competing Cognitive Radio Network (CCRN) coalesces communicator (comm) nodes and jammers to achieve maximal networking efficiency in the presence of adversarial threats. We have previously developed two contrasting approaches for CCRN based on multi-armed bandit (MAB) and Qlearning. Despite their differences, both approaches have shown to achieve optimal throughput performance...

READ MORE

Showing Results

1-10 of 12