Publications

Refine Results

(Filters Applied) Clear All

Approaches for Language Identification in Mismatched Environments

Date:
December 13, 2016
Published in:
Proceedings of SLT 2016, San Diego, Calif.
Type:
Conference Paper

Summary

In this paper, we consider the task of language identification in the context of mismatch conditions. Specifically, we address the issue of using unlabeled data in the domain of interest to improve the performance of a state-of-the-art system.

Detecting Depression using Vocal, Facial and Semantic Communication Cues(308.97 KB)

Date:
October 15, 2016
Published in:
Proceedings of the Audio Visual Emotion Challenge and Workshop, Amsterdam, The Netherlands
Type:
Conference Paper
Topic:

Summary

Major depressive disorder (MDD) is known to result in neurophysiological and neurocognitive changes that affect control of motor, linguistic, and cognitive functions. These changes are associated with a decline in dynamics and coordination of speech and facial motor control, while neurocognitive changes influence dialogue semantics. In this paper, biomarkers are derived from all of these modalities.

Multi-Modal Audio, Video, and Physiological Sensor Learning for Continuous Emotion Prediction(451.61 KB)

Date:
October 15, 2016
Published in:
Proceedings of 2016 AVEC Workshop, ACM Multimedia
Type:
Conference Paper
Topic:

Summary

The automatic determination of emotional state from multimedia content is an inherently challenging problem with a broad range of applications including biomedical diagnostics, multimedia retrieval, and human computer interfaces. This paper provides an overview of our AVEC Emotion Challenge system, which uses multi-feature learning and fusion across all available modalities.

How Deep Neural Networks Can Improve Emotion Recognition on Video Data(547.86 KB)

Date:
September 25, 2016
Published in:
Proceedings of 2016 IEEE International Conference on Image Processing (ICIP)
Type:
Conference Paper
Topic:

Summary

There have been many impressive results obtained using deep learning for emotion recognition tasks in the last few years. In this work, we present a system that performs emotion recognition on video data using both convolutional neural networks (CNNs) and recurrent neural networks (RNNs).

I-Vector Speaker and Language Recognition System on Android,

Date:
September 13, 2016
Published in:
Proceedings of IEEE High Performance Extreme Computing Conference (HPEC '16)
Type:
Conference Paper

Summary

I-Vector based speaker and language identification provides state of the art performance. However, this comes as a more computationally complex solution, which can often lead to challenges in resource-limited devices, such as phones or tablets. We present the implementation of an I-Vector speaker and language recognition system on the Android platform in the form of a fully functional application that allows speaker enrollment and language/speaker scoring within mobile contexts.

Relation of Automatically Extracted Formant Trajectories with Intelligibility Loss and Speaking Rate Decline in Amyotrophic Lateral Sclerosis(906.23 KB)

Date:
September 8, 2016
Published in:
Proceedings of Interspeech 2016, San Francisco, Calif.
Type:
Conference Paper
Topic:

Summary

Effective monitoring of bulbar disease progression in persons with amyotrophic lateral sclerosis (ALS) requires rapid, objective, automatic assessment of speech loss. The purpose of this work was to identify acoustic features that aid in predicting intelligibility loss and speaking rate decline in individuals with ALS.

Relating estimated cyclic spectral peak frequency to measured epilarynx length using Magnetic Resonance Imaging(272.05 KB)

Date:
September 8, 2016
Published in:
Proceedings of Interspeech 2016, San Francisco, Calif.
Type:
Conference Paper
Topic:

Summary

The epilarynx plays an important role in speech production, carrying information about the individual speaker and manner of articulation. Recent spectral processing techniques isolate a unique resonance with characteristics of the epilarynx previously shown via simulation, specifically cyclicity. Using Magnetic Resonance Imaging (MRI), the present work relates this estimated cyclic peak frequency to measured epilarynx length.

Language Recognition via Sparse Coding(354.13 KB)

Date:
September 8, 2016
Published in:
Proceedings of Interspeech 2016, San Francisco, Calif.
Type:
Conference Paper

Summary

Spoken language recognition requires a series of signal processing steps and learning algorithms to model distinguishing characteristics of different languages. In this paper, we present a sparse discriminative feature learning framework for language recognition. We use sparse coding, an unsupervised method, to compute efficient representations for spectral features from a speech utterance while learning basis vectors for language models.

Speaker Recognition Using Real vs Synthetic Parallel Data for DNN Channel Compensation(891.97 KB)

Date:
September 8, 2016
Published in:
Proceedings of Interspeech 2016, San Francisco, Calif.
Type:
Conference Paper

Summary

Recently there has been a great deal of interest in using deep neural networks (DNNs) for channel compensation under reverberant or noisy channel conditions such as those found in microphone data. This paper compares the use of real and synthetic data for training denoising DNNs for multi-microphone speaker recognition.

The AFRL-MITLL WMT16 News-Translation Task Systems(375.46 KB)

Date:
August 16, 2016
Published in:
Proceedings of the 11th Workshop on Machine Translation (WMT’16)
Type:
Conference Paper

Summary

This paper describes the AFRL-MITLL statistical machine translation systems and the improvements that were developed during the WMT16 evaluation campaign. New techniques applied this year include Neural Machine Translation, a unique selection process for language modelling data, additional out-of-vocabulary transliteration techniques, and morphology generation.