Publications

Refine Results

(Filters Applied) Clear All

Multi-Modal Audio, Video, and Physiological Sensor Learning for Continuous Emotion Prediction(451.61 KB)

Date:
October 15, 2016
Published in:
Proceedings of 2016 AVEC Workshop, ACM Multimedia
Type:
Conference Paper
Topic:

Summary

The automatic determination of emotional state from multimedia content is an inherently challenging problem with a broad range of applications including biomedical diagnostics, multimedia retrieval, and human computer interfaces. This paper provides an overview of our AVEC Emotion Challenge system, which uses multi-feature learning and fusion across all available modalities.
READ LESS

Summary

The automatic determination of emotional state from multimedia content is an inherently challenging problem with a broad range of applications including biomedical diagnostics, multimedia retrieval, and human computer interfaces. This paper provides an overview of our AVEC Emotion Challenge system, which uses multi-feature learning and fusion across all available modalities.
READ MORE

Use of Photoacoustic Excitation and Laser Vibrometry to Remotely Detect Trace Explosives

Date:
October 6, 2016
Published in:
Applied Optics, vol. 55, no. 32
Type:
Journal Article

Summary

In this paper, we examine a laser-based approach to remotely initiate, measure, and differentiate acoustic and vibrational emissions from trace quantities of explosive materials against their environment. Using a pulsed ultraviolet laser (266 nm), we induce a significant (>100  Pa) photoacoustic response from small quantities of military-grade explosives. The photoacoustic signal, with frequencies predominantly between 100 and 500 kHz, is detected remotely via a wideband laser Doppler vibrometer. This two-laser system can be used to rapidly detect and discriminate explosives from ordinary background materials, which have significantly weaker photoacoustic response. A 100  ng/cm2 limit of detection is estimated. Photoablation is proposed as the dominant mechanism for the large photoacoustic signals generated by explosives.
READ LESS

Summary

In this paper, we examine a laser-based approach to remotely initiate, measure, and differentiate acoustic and vibrational emissions from trace quantities of explosive materials against their environment. Using a pulsed ultraviolet laser (266 nm), we induce a significant (>100  Pa) photoacoustic response from small quantities of military-grade explosives. The photoacoustic signal,...
READ MORE

How Deep Neural Networks Can Improve Emotion Recognition on Video Data(547.86 KB)

Date:
September 25, 2016
Published in:
Proceedings of 2016 IEEE International Conference on Image Processing (ICIP)
Type:
Conference Paper
Topic:

Summary

There have been many impressive results obtained using deep learning for emotion recognition tasks in the last few years. In this work, we present a system that performs emotion recognition on video data using both convolutional neural networks (CNNs) and recurrent neural networks (RNNs).
READ LESS

Summary

There have been many impressive results obtained using deep learning for emotion recognition tasks in the last few years. In this work, we present a system that performs emotion recognition on video data using both convolutional neural networks (CNNs) and recurrent neural networks (RNNs).
READ MORE

The Offshore Precipitation Capability

Date:
September 16, 2016
Published in:
Project Report ATC-430, MIT Lincoln Laboratory
Type:
Project Report

Summary

The Offshore Precipitation Capability (OPC) uses machine learning and image processing methods to estimate radar-like precipitation intensity and echo top heights beyond the range of weather radar.
READ LESS

Summary

The Offshore Precipitation Capability (OPC) uses machine learning and image processing methods to estimate radar-like precipitation intensity and echo top heights beyond the range of weather radar.
READ MORE

High-throughput Ingest of Data Provenance Records into Accumulo(349.93 KB)

Date:
September 13, 2016
Published in:
Proceedings of IEEE High Performance Extreme Computing Conference (HPEC '16)
Type:
Conference Paper

Summary

Whole-system data provenance provides deep insight into the processing of data on a system, including detecting data integrity attacks. The downside to systems that collect whole-system data provenance is the sheer volume of data that is generated under many heavy workloads. In this paper, we investigate the use of D4M and Accumulo to support high-throughput data ingest of whole-system provenance data.
READ LESS

Summary

Whole-system data provenance provides deep insight into the processing of data on a system, including detecting data integrity attacks. The downside to systems that collect whole-system data provenance is the sheer volume of data that is generated under many heavy workloads. In this paper, we investigate the use of D4M...
READ MORE

I-Vector Speaker and Language Recognition System on Android,

Date:
September 13, 2016
Published in:
Proceedings of IEEE High Performance Extreme Computing Conference (HPEC '16)
Type:
Conference Paper

Summary

I-Vector based speaker and language identification provides state of the art performance. However, this comes as a more computationally complex solution, which can often lead to challenges in resource-limited devices, such as phones or tablets. We present the implementation of an I-Vector speaker and language recognition system on the Android platform in the form of a fully functional application that allows speaker enrollment and language/speaker scoring within mobile contexts.
READ LESS

Summary

I-Vector based speaker and language identification provides state of the art performance. However, this comes as a more computationally complex solution, which can often lead to challenges in resource-limited devices, such as phones or tablets. We present the implementation of an I-Vector speaker and language recognition system on the Android...
READ MORE

Relation of Automatically Extracted Formant Trajectories with Intelligibility Loss and Speaking Rate Decline in Amyotrophic Lateral Sclerosis(906.23 KB)

Date:
September 8, 2016
Published in:
Proceedings of Interspeech 2016, San Francisco, Calif.
Type:
Conference Paper
Topic:

Summary

Effective monitoring of bulbar disease progression in persons with amyotrophic lateral sclerosis (ALS) requires rapid, objective, automatic assessment of speech loss. The purpose of this work was to identify acoustic features that aid in predicting intelligibility loss and speaking rate decline in individuals with ALS.
READ LESS

Summary

Effective monitoring of bulbar disease progression in persons with amyotrophic lateral sclerosis (ALS) requires rapid, objective, automatic assessment of speech loss. The purpose of this work was to identify acoustic features that aid in predicting intelligibility loss and speaking rate decline in individuals with ALS.
READ MORE

Relating estimated cyclic spectral peak frequency to measured epilarynx length using Magnetic Resonance Imaging(272.05 KB)

Date:
September 8, 2016
Published in:
Proceedings of Interspeech 2016, San Francisco, Calif.
Type:
Conference Paper
Topic:

Summary

The epilarynx plays an important role in speech production, carrying information about the individual speaker and manner of articulation. Recent spectral processing techniques isolate a unique resonance with characteristics of the epilarynx previously shown via simulation, specifically cyclicity. Using Magnetic Resonance Imaging (MRI), the present work relates this estimated cyclic peak frequency to measured epilarynx length.
READ LESS

Summary

The epilarynx plays an important role in speech production, carrying information about the individual speaker and manner of articulation. Recent spectral processing techniques isolate a unique resonance with characteristics of the epilarynx previously shown via simulation, specifically cyclicity. Using Magnetic Resonance Imaging (MRI), the present work relates this estimated cyclic...
READ MORE

Language Recognition via Sparse Coding(354.13 KB)

Date:
September 8, 2016
Published in:
Proceedings of Interspeech 2016, San Francisco, Calif.
Type:
Conference Paper

Summary

Spoken language recognition requires a series of signal processing steps and learning algorithms to model distinguishing characteristics of different languages. In this paper, we present a sparse discriminative feature learning framework for language recognition. We use sparse coding, an unsupervised method, to compute efficient representations for spectral features from a speech utterance while learning basis vectors for language models.
READ LESS

Summary

Spoken language recognition requires a series of signal processing steps and learning algorithms to model distinguishing characteristics of different languages. In this paper, we present a sparse discriminative feature learning framework for language recognition. We use sparse coding, an unsupervised method, to compute efficient representations for spectral features from a...
READ MORE

Speaker Recognition Using Real vs Synthetic Parallel Data for DNN Channel Compensation(891.97 KB)

Date:
September 8, 2016
Published in:
Proceedings of Interspeech 2016, San Francisco, Calif.
Type:
Conference Paper

Summary

Recently there has been a great deal of interest in using deep neural networks (DNNs) for channel compensation under reverberant or noisy channel conditions such as those found in microphone data. This paper compares the use of real and synthetic data for training denoising DNNs for multi-microphone speaker recognition.
READ LESS

Summary

Recently there has been a great deal of interest in using deep neural networks (DNNs) for channel compensation under reverberant or noisy channel conditions such as those found in microphone data. This paper compares the use of real and synthetic data for training denoising DNNs for multi-microphone speaker recognition.
READ MORE