Publications

Refine Results

(Filters Applied) Clear All

This looks like that: deep learning for interpretable image recognition

Published in:
Neural Info. Process., NIPS, 8-14 December 2019.

Summary

When we are faced with challenging image classification tasks, we often explain our reasoning by dissecting the image, and pointing out prototypical aspects of one class or another. The mounting evidence for each of the classes helps us make our final decision. In this work, we introduce a deep network architecture that reasons in a similar way: the network dissects the image by finding prototypical parts, and combines evidence from the prototypes to make a final classification. The algorithm thus reasons in a way that is qualitatively similar to the way ornithologists, physicians, geologists, architects, and others would explain to people on how to solve challenging image classification tasks. The network uses only image-level labels for training, meaning that there are no labels for parts of images. We demonstrate the method on the CIFAR-10 dataset and 10 classes from the CUB-200-2011 dataset.
READ LESS

Summary

When we are faced with challenging image classification tasks, we often explain our reasoning by dissecting the image, and pointing out prototypical aspects of one class or another. The mounting evidence for each of the classes helps us make our final decision. In this work, we introduce a deep network...

READ MORE

Feature forwarding for efficient single image dehazing

Published in:
IEEE/CVF Conf. on Computer Vision and Pattern Recognition Workshops, CVPRW, 16-17 June 2019.

Summary

Haze degrades content and obscures information of images, which can negatively impact vision-based decision-making in real-time systems. In this paper, we propose an efficient fully convolutional neural network (CNN) image dehazing method designed to run on edge graphical processing units (GPUs). We utilize three variants of our architecture to explore the dependency of dehazed image quality on parameter count and model design. The first two variants presented, a small and big version, make use of a single efficient encoder–decoder convolutional feature extractor. The final variant utilizes a pair of encoder-decoders for atmospheric light and transmission map estimation. Each variant ends with an image refinement pyramid pooling network to form the final dehazed image. For the big variant of the single-encoder network, we demonstrate state-of-the-art performance on the NYU Depth dataset. For the small variant, we maintain competitive performance on the superresolution O/I-HAZE datasets without the need for image cropping. Finally, we examine some challenges presented by the Dense-Haze dataset when leveraging CNN architectures for dehazing of dense haze imagery and examine the impact of loss function selection on image quality. Benchmarks are included to show the feasibility of introducing this approach into real-time systems.
READ LESS

Summary

Haze degrades content and obscures information of images, which can negatively impact vision-based decision-making in real-time systems. In this paper, we propose an efficient fully convolutional neural network (CNN) image dehazing method designed to run on edge graphical processing units (GPUs). We utilize three variants of our architecture to explore...

READ MORE

Multi-modal audio, video and physiological sensor learning for continuous emotion prediction

Summary

The automatic determination of emotional state from multimedia content is an inherently challenging problem with a broad range of applications including biomedical diagnostics, multimedia retrieval, and human computer interfaces. The Audio Video Emotion Challenge (AVEC) 2016 provides a well-defined framework for developing and rigorously evaluating innovative approaches for estimating the arousal and valence states of emotion as a function of time. It presents the opportunity for investigating multimodal solutions that include audio, video, and physiological sensor signals. This paper provides an overview of our AVEC Emotion Challenge system, which uses multi-feature learning and fusion across all available modalities. It includes a number of technical contributions, including the development of novel high- and low-level features for modeling emotion in the audio, video, and physiological channels. Low-level features include modeling arousal in audio with minimal prosodic-based descriptors. High-level features are derived from supervised and unsupervised machine learning approaches based on sparse coding and deep learning. Finally, a state space estimation approach is applied for score fusion that demonstrates the importance of exploiting the time-series nature of the arousal and valence states. The resulting system outperforms the baseline systems [10] on the test evaluation set with an achieved Concordant Correlation Coefficient (CCC) for arousal of 0.770 vs 0.702 (baseline) and for valence of 0.687 vs 0.638. Future work will focus on exploiting the time-varying nature of individual channels in the multi-modal framework.
READ LESS

Summary

The automatic determination of emotional state from multimedia content is an inherently challenging problem with a broad range of applications including biomedical diagnostics, multimedia retrieval, and human computer interfaces. The Audio Video Emotion Challenge (AVEC) 2016 provides a well-defined framework for developing and rigorously evaluating innovative approaches for estimating the...

READ MORE

Detecting depression using vocal, facial and semantic communication cues

Summary

Major depressive disorder (MDD) is known to result in neurophysiological and neurocognitive changes that affect control of motor, linguistic, and cognitive functions. MDD's impact on these processes is reflected in an individual's communication via coupled mechanisms: vocal articulation, facial gesturing and choice of content to convey in a dialogue. In particular, MDD-induced neurophysiological changes are associated with a decline in dynamics and coordination of speech and facial motor control, while neurocognitive changes influence dialogue semantics. In this paper, biomarkers are derived from all of these modalities, drawing first from previously developed neurophysiologically motivated speech and facial coordination and timing features. In addition, a novel indicator of lower vocal tract constriction in articulation is incorporated that relates to vocal projection. Semantic features are analyzed for subject/avatar dialogue content using a sparse coded lexical embedding space, and for contextual clues related to the subject's present or past depression status. The features and depression classification system were developed for the 6th International Audio/Video Emotion Challenge (AVEC), which provides data consisting of audio, video-based facial action units, and transcribed text of individuals communicating with the human-controlled avatar. A clinical Patient Health Questionnaire (PHQ) score and binary depression decision are provided for each participant. PHQ predictions were obtained by fusing outputs from a Gaussian staircase regressor for each feature set, with results on the development set of mean F1=0.81, RMSE=5.31, and MAE=3.34. These compare favorably to the challenge baseline development results of mean F1=0.73, RMSE=6.62, and MAE=5.52. On test set evaluation, our system obtained a mean F1=0.70, which is similar to the challenge baseline test result. Future work calls for consideration of joint feature analyses across modalities in an effort to detect neurological disorders based on the interplay of motor, linguistic, affective, and cognitive components of communication.
READ LESS

Summary

Major depressive disorder (MDD) is known to result in neurophysiological and neurocognitive changes that affect control of motor, linguistic, and cognitive functions. MDD's impact on these processes is reflected in an individual's communication via coupled mechanisms: vocal articulation, facial gesturing and choice of content to convey in a dialogue. In...

READ MORE

How deep neural networks can improve emotion recognition on video data

Published in:
ICIP: 2016 IEEE Int. Conf. on Image Processing, 25-28 September 2016.

Summary

We consider the task of dimensional emotion recognition on video data using deep learning. While several previous methods have shown the benefits of training temporal neural network models such as recurrent neural networks (RNNs) on hand-crafted features, few works have considered combining convolutional neural networks (CNNs) with RNNs. In this work, we present a system that performs emotion recognition on video data using both CNNs and RNNs, and we also analyze how much each neural network component contributes to the system's overall performance. We present our findings on videos from the Audio/Visual+Emotion Challenge (AV+EC2015). In our experiments, we analyze the effects of several hyperparameters on overall performance while also achieving superior performance to the baseline and other competing methods.
READ LESS

Summary

We consider the task of dimensional emotion recognition on video data using deep learning. While several previous methods have shown the benefits of training temporal neural network models such as recurrent neural networks (RNNs) on hand-crafted features, few works have considered combining convolutional neural networks (CNNs) with RNNs. In this...

READ MORE

The Offshore Precipitation Capability

Summary

In this work, machine learning and image processing methods are used to estimate radar-like precipitation intensity and echo top heights beyond the range of weather radar. The technology, called the Offshore Precipitation Capability (OPC), combines global lightning data with existing radar mosaics, five Geostationary Operational Environmental Satellite (GOES) channels, and several fields from the Rapid Refresh (RAP) 13 km numerical weather prediction model to create precipitation and echo top fields similar to those provided by existing Federal Aviation Administration (FAA) weather systems. Preprocessing and feature extraction methods are described to construct inputs for model training. A variety of machine learning algorithms are investigated to identify which provides the most accuracy. Output from the machine learning model is blended with existing radar mosaics to create weather radar-like analyses that extend into offshore regions. The resulting fields are validated using land radars and satellite precipitation measurements provided by the National Aeronautics and Space Administration (NASA) Global Precipitation Measurement Mission (GPM) core observatory satellite. This capability is initially being developed for the Miami Oceanic airspace with the goal of providing improved situational awareness for offshore air traffic control.
READ LESS

Summary

In this work, machine learning and image processing methods are used to estimate radar-like precipitation intensity and echo top heights beyond the range of weather radar. The technology, called the Offshore Precipitation Capability (OPC), combines global lightning data with existing radar mosaics, five Geostationary Operational Environmental Satellite (GOES) channels, and...

READ MORE

Time delay integration and in-pixel spatiotemporal filtering using a nanoscale digital CMOS focal plane readout

Summary

A digital focal plane array (DFPA) architecture has been developed that incorporates per-pixel full-dynamic-range analog-to-digital conversion and orthogonal-transfer-based realtime digital signal processing capability. Several long-wave infrared-optimized pixel processing focal plane readout integrated circuit (ROIC) designs have been implemented, each accommodating a 256 x 256 30-um-pitch detector array. Demonstrated in this paper is the application of this DFPA ROIC architecture to problems of background pedestal mitigation, wide-field imaging, image stabilization, edge detection, and velocimetry. The DFPA architecture is reviewed, and pixel performance metrics are discussed in the context of the application examples. The measured data reported here are for DFPA ROICs implemented in 90-nm CMOS technology and hybridized to HgxCd1-xTe (MCT) detector arrays with cutoff wavelengths ranging from 7 to 14.5 m and a specified operating temperature of 60 K-80 K.
READ LESS

Summary

A digital focal plane array (DFPA) architecture has been developed that incorporates per-pixel full-dynamic-range analog-to-digital conversion and orthogonal-transfer-based realtime digital signal processing capability. Several long-wave infrared-optimized pixel processing focal plane readout integrated circuit (ROIC) designs have been implemented, each accommodating a 256 x 256 30-um-pitch detector array. Demonstrated in this...

READ MORE

A method for correcting Fourier transform spectrometer (FTS) dynamic alignment errors

Published in:
SPIE Vol. 5425, Algorithms and Technologies for Multispectral, Hyperspectral, and Ultraspectral Imagery X, 12-15 April 2004, pp. 443-455.

Summary

The Cross-track Infrared Sounder (CrIS), like most Fourier Transform spectrometers, can be sensitive to mechanical disturbances during the time spectral data is collected. The Michelson interferometer within the spectrometer modulates input radiation at a frequency equal to the product of the wavenumber of the radiation and the constant optical path difference (OPD) velocity associated with the moving mirror. The modulation efficiency depends on the angular alignment of the two wavefronts exiting the spectrometer. Mechanical disturbances can cause errors in the alignment of the wavefronts which manifest as noise in the spectrum. To mitigate these affects CrIS will employ a laser to monitor alignment and dynamically correct the errors. Additionally, a vibration isolation system will damp disturbances imparted to the sensor from the spacecraft. Despite these efforts, residual noise may remain under certain conditions. Through simulation of CrIS data, we demonstrated an algorithmic technique to correct residual dynamic alignment errors. The technique requires only the time-dependent wavefront angle, sampled coincidentally with the interferogram, and the second derivative of the erroneous interferogram as inputs to compute the correction. The technique can function with raw interferograms on board the spacecraft, or with decimated interferograms on the ground. We were able to reduce the dynamic alignment noise by approximately a factor of ten in both cases. Performing the correction on the ground would require an increase in data rate of 1-2% over what is currently planned, in the form of 8-bit digitized angle data.
READ LESS

Summary

The Cross-track Infrared Sounder (CrIS), like most Fourier Transform spectrometers, can be sensitive to mechanical disturbances during the time spectral data is collected. The Michelson interferometer within the spectrometer modulates input radiation at a frequency equal to the product of the wavenumber of the radiation and the constant optical path...

READ MORE

A multi-threaded fast convolver for dynamically parallel image filtering

Author:
Published in:
J. Parallel Distrib. Comput, Vol. 63, No. 3, March 2003, pp. 360-372.

Summary

2D convolution is a staple of digital image processing. The advent of large format imagers makes it possible to literally ''pave'' with silicon the focal plane of an optical sensor, which results in very large images that can require a significant amount computation to process. Filtering of large images via 2D convolutions is often complicated by a variety of effects (e.g., non-uniformities found in wide field of view instruments) which must be compensated for in the filtering process by changing the filter across the image. This paper describes a fast (FFT based) method for convolving images with slowly varying filters. A parallel version of the method is implemented using a multi-threaded approach, which allows more efficient load balancing and a simpler software architecture. The method has been implemented within a high level interpreted language (IDL), while also exploiting open standards vector libraries (VSIPL) and open standards parallel directives (OpenMP). The parallel approach and software architecture are generally applicable to a variety of algorithms and has the advantage of enabling users to obtain the convenience of an easy operating environment while also delivering high performance using a fully portable code.
READ LESS

Summary

2D convolution is a staple of digital image processing. The advent of large format imagers makes it possible to literally ''pave'' with silicon the focal plane of an optical sensor, which results in very large images that can require a significant amount computation to process. Filtering of large images via...

READ MORE

CSKETCH image processing library

Author:
Published in:
MIT Lincoln Laboratory Report ATC-283

Summary

The CSKETCH image processing library is a collection of C++ classes and global functions which comprise a development environment for meteorological algorithms. The library is best thought of as a 'tool-kit' which contains many standard mathematical and signal processing functions often employed in the analysis of weather radar data. A tutorial-style introduction to the library is given, complete with many examples of class and global function usage. Included is an in-depth look at the main class of the library, the SKArray class, which is a templatized and encapsulated class for storing numerical data arrays of one, two, or three dimensions. Following the tutorial is a complete reference for the library which describes all publicly-available class data members and class member functions, as well as all global functions included in the library.
READ LESS

Summary

The CSKETCH image processing library is a collection of C++ classes and global functions which comprise a development environment for meteorological algorithms. The library is best thought of as a 'tool-kit' which contains many standard mathematical and signal processing functions often employed in the analysis of weather radar data. A...

READ MORE

Showing Results

1-10 of 13