Publications

Refine Results

(Filters Applied) Clear All

Multi-modal audio, video and physiological sensor learning for continuous emotion prediction

Summary

The automatic determination of emotional state from multimedia content is an inherently challenging problem with a broad range of applications including biomedical diagnostics, multimedia retrieval, and human computer interfaces. The Audio Video Emotion Challenge (AVEC) 2016 provides a well-defined framework for developing and rigorously evaluating innovative approaches for estimating the arousal and valence states of emotion as a function of time. It presents the opportunity for investigating multimodal solutions that include audio, video, and physiological sensor signals. This paper provides an overview of our AVEC Emotion Challenge system, which uses multi-feature learning and fusion across all available modalities. It includes a number of technical contributions, including the development of novel high- and low-level features for modeling emotion in the audio, video, and physiological channels. Low-level features include modeling arousal in audio with minimal prosodic-based descriptors. High-level features are derived from supervised and unsupervised machine learning approaches based on sparse coding and deep learning. Finally, a state space estimation approach is applied for score fusion that demonstrates the importance of exploiting the time-series nature of the arousal and valence states. The resulting system outperforms the baseline systems [10] on the test evaluation set with an achieved Concordant Correlation Coefficient (CCC) for arousal of 0.770 vs 0.702 (baseline) and for valence of 0.687 vs 0.638. Future work will focus on exploiting the time-varying nature of individual channels in the multi-modal framework.
READ LESS

Summary

The automatic determination of emotional state from multimedia content is an inherently challenging problem with a broad range of applications including biomedical diagnostics, multimedia retrieval, and human computer interfaces. The Audio Video Emotion Challenge (AVEC) 2016 provides a well-defined framework for developing and rigorously evaluating innovative approaches for estimating the...

READ MORE

Detecting depression using vocal, facial and semantic communication cues

Summary

Major depressive disorder (MDD) is known to result in neurophysiological and neurocognitive changes that affect control of motor, linguistic, and cognitive functions. MDD's impact on these processes is reflected in an individual's communication via coupled mechanisms: vocal articulation, facial gesturing and choice of content to convey in a dialogue. In particular, MDD-induced neurophysiological changes are associated with a decline in dynamics and coordination of speech and facial motor control, while neurocognitive changes influence dialogue semantics. In this paper, biomarkers are derived from all of these modalities, drawing first from previously developed neurophysiologically motivated speech and facial coordination and timing features. In addition, a novel indicator of lower vocal tract constriction in articulation is incorporated that relates to vocal projection. Semantic features are analyzed for subject/avatar dialogue content using a sparse coded lexical embedding space, and for contextual clues related to the subject's present or past depression status. The features and depression classification system were developed for the 6th International Audio/Video Emotion Challenge (AVEC), which provides data consisting of audio, video-based facial action units, and transcribed text of individuals communicating with the human-controlled avatar. A clinical Patient Health Questionnaire (PHQ) score and binary depression decision are provided for each participant. PHQ predictions were obtained by fusing outputs from a Gaussian staircase regressor for each feature set, with results on the development set of mean F1=0.81, RMSE=5.31, and MAE=3.34. These compare favorably to the challenge baseline development results of mean F1=0.73, RMSE=6.62, and MAE=5.52. On test set evaluation, our system obtained a mean F1=0.70, which is similar to the challenge baseline test result. Future work calls for consideration of joint feature analyses across modalities in an effort to detect neurological disorders based on the interplay of motor, linguistic, affective, and cognitive components of communication.
READ LESS

Summary

Major depressive disorder (MDD) is known to result in neurophysiological and neurocognitive changes that affect control of motor, linguistic, and cognitive functions. MDD's impact on these processes is reflected in an individual's communication via coupled mechanisms: vocal articulation, facial gesturing and choice of content to convey in a dialogue. In...

READ MORE

How deep neural networks can improve emotion recognition on video data

Published in:
ICIP: 2016 IEEE Int. Conf. on Image Processing, 25-28 September 2016.

Summary

We consider the task of dimensional emotion recognition on video data using deep learning. While several previous methods have shown the benefits of training temporal neural network models such as recurrent neural networks (RNNs) on hand-crafted features, few works have considered combining convolutional neural networks (CNNs) with RNNs. In this work, we present a system that performs emotion recognition on video data using both CNNs and RNNs, and we also analyze how much each neural network component contributes to the system's overall performance. We present our findings on videos from the Audio/Visual+Emotion Challenge (AV+EC2015). In our experiments, we analyze the effects of several hyperparameters on overall performance while also achieving superior performance to the baseline and other competing methods.
READ LESS

Summary

We consider the task of dimensional emotion recognition on video data using deep learning. While several previous methods have shown the benefits of training temporal neural network models such as recurrent neural networks (RNNs) on hand-crafted features, few works have considered combining convolutional neural networks (CNNs) with RNNs. In this...

READ MORE

The Offshore Precipitation Capability

Summary

In this work, machine learning and image processing methods are used to estimate radar-like precipitation intensity and echo top heights beyond the range of weather radar. The technology, called the Offshore Precipitation Capability (OPC), combines global lightning data with existing radar mosaics, five Geostationary Operational Environmental Satellite (GOES) channels, and several fields from the Rapid Refresh (RAP) 13 km numerical weather prediction model to create precipitation and echo top fields similar to those provided by existing Federal Aviation Administration (FAA) weather systems. Preprocessing and feature extraction methods are described to construct inputs for model training. A variety of machine learning algorithms are investigated to identify which provides the most accuracy. Output from the machine learning model is blended with existing radar mosaics to create weather radar-like analyses that extend into offshore regions. The resulting fields are validated using land radars and satellite precipitation measurements provided by the National Aeronautics and Space Administration (NASA) Global Precipitation Measurement Mission (GPM) core observatory satellite. This capability is initially being developed for the Miami Oceanic airspace with the goal of providing improved situational awareness for offshore air traffic control.
READ LESS

Summary

In this work, machine learning and image processing methods are used to estimate radar-like precipitation intensity and echo top heights beyond the range of weather radar. The technology, called the Offshore Precipitation Capability (OPC), combines global lightning data with existing radar mosaics, five Geostationary Operational Environmental Satellite (GOES) channels, and...

READ MORE

Time delay integration and in-pixel spatiotemporal filtering using a nanoscale digital CMOS focal plane readout

Summary

A digital focal plane array (DFPA) architecture has been developed that incorporates per-pixel full-dynamic-range analog-to-digital conversion and orthogonal-transfer-based realtime digital signal processing capability. Several long-wave infrared-optimized pixel processing focal plane readout integrated circuit (ROIC) designs have been implemented, each accommodating a 256 x 256 30-um-pitch detector array. Demonstrated in this paper is the application of this DFPA ROIC architecture to problems of background pedestal mitigation, wide-field imaging, image stabilization, edge detection, and velocimetry. The DFPA architecture is reviewed, and pixel performance metrics are discussed in the context of the application examples. The measured data reported here are for DFPA ROICs implemented in 90-nm CMOS technology and hybridized to HgxCd1-xTe (MCT) detector arrays with cutoff wavelengths ranging from 7 to 14.5 m and a specified operating temperature of 60 K-80 K.
READ LESS

Summary

A digital focal plane array (DFPA) architecture has been developed that incorporates per-pixel full-dynamic-range analog-to-digital conversion and orthogonal-transfer-based realtime digital signal processing capability. Several long-wave infrared-optimized pixel processing focal plane readout integrated circuit (ROIC) designs have been implemented, each accommodating a 256 x 256 30-um-pitch detector array. Demonstrated in this...

READ MORE

A method for correcting Fourier transform spectrometer (FTS) dynamic alignment errors

Published in:
SPIE Vol. 5425, Algorithms and Technologies for Multispectral, Hyperspectral, and Ultraspectral Imagery X, 12-15 April 2004, pp. 443-455.

Summary

The Cross-track Infrared Sounder (CrIS), like most Fourier Transform spectrometers, can be sensitive to mechanical disturbances during the time spectral data is collected. The Michelson interferometer within the spectrometer modulates input radiation at a frequency equal to the product of the wavenumber of the radiation and the constant optical path difference (OPD) velocity associated with the moving mirror. The modulation efficiency depends on the angular alignment of the two wavefronts exiting the spectrometer. Mechanical disturbances can cause errors in the alignment of the wavefronts which manifest as noise in the spectrum. To mitigate these affects CrIS will employ a laser to monitor alignment and dynamically correct the errors. Additionally, a vibration isolation system will damp disturbances imparted to the sensor from the spacecraft. Despite these efforts, residual noise may remain under certain conditions. Through simulation of CrIS data, we demonstrated an algorithmic technique to correct residual dynamic alignment errors. The technique requires only the time-dependent wavefront angle, sampled coincidentally with the interferogram, and the second derivative of the erroneous interferogram as inputs to compute the correction. The technique can function with raw interferograms on board the spacecraft, or with decimated interferograms on the ground. We were able to reduce the dynamic alignment noise by approximately a factor of ten in both cases. Performing the correction on the ground would require an increase in data rate of 1-2% over what is currently planned, in the form of 8-bit digitized angle data.
READ LESS

Summary

The Cross-track Infrared Sounder (CrIS), like most Fourier Transform spectrometers, can be sensitive to mechanical disturbances during the time spectral data is collected. The Michelson interferometer within the spectrometer modulates input radiation at a frequency equal to the product of the wavenumber of the radiation and the constant optical path...

READ MORE

A multi-threaded fast convolver for dynamically parallel image filtering

Author:
Published in:
J. Parallel Distrib. Comput, Vol. 63, No. 3, March 2003, pp. 360-372.

Summary

2D convolution is a staple of digital image processing. The advent of large format imagers makes it possible to literally ''pave'' with silicon the focal plane of an optical sensor, which results in very large images that can require a significant amount computation to process. Filtering of large images via 2D convolutions is often complicated by a variety of effects (e.g., non-uniformities found in wide field of view instruments) which must be compensated for in the filtering process by changing the filter across the image. This paper describes a fast (FFT based) method for convolving images with slowly varying filters. A parallel version of the method is implemented using a multi-threaded approach, which allows more efficient load balancing and a simpler software architecture. The method has been implemented within a high level interpreted language (IDL), while also exploiting open standards vector libraries (VSIPL) and open standards parallel directives (OpenMP). The parallel approach and software architecture are generally applicable to a variety of algorithms and has the advantage of enabling users to obtain the convenience of an easy operating environment while also delivering high performance using a fully portable code.
READ LESS

Summary

2D convolution is a staple of digital image processing. The advent of large format imagers makes it possible to literally ''pave'' with silicon the focal plane of an optical sensor, which results in very large images that can require a significant amount computation to process. Filtering of large images via...

READ MORE

CSKETCH image processing library

Author:
Published in:
MIT Lincoln Laboratory Report ATC-283

Summary

The CSKETCH image processing library is a collection of C++ classes and global functions which comprise a development environment for meteorological algorithms. The library is best thought of as a 'tool-kit' which contains many standard mathematical and signal processing functions often employed in the analysis of weather radar data. A tutorial-style introduction to the library is given, complete with many examples of class and global function usage. Included is an in-depth look at the main class of the library, the SKArray class, which is a templatized and encapsulated class for storing numerical data arrays of one, two, or three dimensions. Following the tutorial is a complete reference for the library which describes all publicly-available class data members and class member functions, as well as all global functions included in the library.
READ LESS

Summary

The CSKETCH image processing library is a collection of C++ classes and global functions which comprise a development environment for meteorological algorithms. The library is best thought of as a 'tool-kit' which contains many standard mathematical and signal processing functions often employed in the analysis of weather radar data. A...

READ MORE

ASR-9 Weather Systems Processor (WSP) signal processing algorithms

Author:
Published in:
MIT Lincoln Laboratory Report ATC-255

Summary

Thunderstorm activity and associated low-altitude wind shear constitute a significant safety hazard to aviation, particularly during operations near airport terminals where aircraft altitude is low and flight routes are constrained. The Federal Aviation Administration (FAA) has procured several dedicated meteorological sensors (Terminal Doppler Weather Radar (TDWR), Network Expansion Low Level Wind Shear Alert System (LLWAS) at major airports to enhance the safety and efficiency of operations during convective weather. A hardware and software modification to existing Airport Surveillance Radars (ASR-9)-the Weather Systems Processor (WSP)-will provide similar capabilities at much lower cost, thus allowing the FAA to extend its protection envelope to medium density airports and airports where thunderstorm activity is less frequent. Following successful operation demonstrations of a prototype ASR-WSP, the FAA has procured approximately 35 WSP's for nationwide deployment. Lincoln Laboratory was responsible for development of all data processing algorithms, which were provided as Government Furnished Equipment (GFE), to be implemented by the full-scale development (FSD) contractor without modification. This report defines the operations that are used to produce images of atmospheric reflectivity, Doppler velocity and data quality that are used by WSP's meteorological product algorithms to generate automated information on hazardous wind shear and other phenomena. Principle requirements are suppression of interference (e.g. ground clutter, moving points targets, meteorological and ground echoes originating from beyond the radar's unambiguous range), generation of meteorologically relevant images and estimates of data quality. Hereafter, these operations will be referred to as "signal processing" and the resulting images as "base data."
READ LESS

Summary

Thunderstorm activity and associated low-altitude wind shear constitute a significant safety hazard to aviation, particularly during operations near airport terminals where aircraft altitude is low and flight routes are constrained. The Federal Aviation Administration (FAA) has procured several dedicated meteorological sensors (Terminal Doppler Weather Radar (TDWR), Network Expansion Low Level...

READ MORE

The Weather-Huffman method of data compression of weather images

Published in:
MIT Lincoln Laboratory Report ATC-261

Summary

Providing an accurate picture of the weather conditions in the pilot's area of interest is a highly useful application for ground-to-air datalinks. The problem with using data links to transmit weather graphics is the large number of bits required to exactly specify the weather image. To make transmission of weather images practical, a means must be found to compress the data to a size compatible with a limited datalink capacity. The Weather-Huffman (WH) Algorithm developed in this report incorporates several subalgorithms in order to encode as faithfully as possible an input weather image within a specified datalink bit limitation. The main algorithm component is the encoding of a version of the input image via the Weather Huffman runlength code, a variant of the standard Huffman code tailored to the peculiarities of weather images. If possible, the input map itself is encoded. Generally, however, a resolution-reduced version of the map must be created prior to the encoding to meet the bit limitation. In that case, the output map will contain blocky regions, and higher weather level areas will tend to bloom in size. Two routines are included in WH to overcome these problems. The first is a Smoother Process, which corrects the blocky edges of weather regions. The second, more powerful routine, is the Extra Bit Algorithm (EBA). EBA utilizes all bits remaining in the message after the Huffman encoding to correct pixels set at too high a weather level. Both size and shape of weather regions are adjusted by this algorithim. Pictorial examples of the operation of this algorithm on several severe weather images derived from NEXRAD are presented.
READ LESS

Summary

Providing an accurate picture of the weather conditions in the pilot's area of interest is a highly useful application for ground-to-air datalinks. The problem with using data links to transmit weather graphics is the large number of bits required to exactly specify the weather image. To make transmission of weather...

READ MORE

Showing Results

1-10 of 11