Publications

Refine Results

(Filters Applied) Clear All

Predicting and analyzing factors in patent litigation

Published in:
30th Conf. on Neural Information Processing System, NIPS 2016, 5-10 December 2016.

Summary

Patent litigation is an expensive and time-consuming process. To minimize its impact on the participants in the patent lifecycle, automatic determination of litigation potential is a compelling machine learning application. In this paper, we consider preliminary methods for the prediction of a patent being involved in litigation using metadata, content, and graph features. Metadata features are top-level easily-extractable features, i.e., assignee, number of claims, etc. The content feature performs lexical analysis of the claims associated to a patent. Graph features use relational learning to summarize patent references. We apply our methods on US patents using a labeled data set. Prior work has focused on metadata-only features, but we show that both graph and content features have significant predictive capability. Additionally, fusing all features results in improved performance. We also perform a preliminary examination of some of the qualitative factors that may have significant importance in patent litigation.
READ LESS

Summary

Patent litigation is an expensive and time-consuming process. To minimize its impact on the participants in the patent lifecycle, automatic determination of litigation potential is a compelling machine learning application. In this paper, we consider preliminary methods for the prediction of a patent being involved in litigation using metadata, content...

READ MORE

Writing your first paper: from code to research

Published in:
Grace Hopper Celebration of Women in Computing, 19-21 October 2016.

Summary

'Publish or perish,' once a term used to refer to the pressure placed on professors to publish their research has since expanded to apply to students and professionals in industry. There are numerous benefits to doing research and publishing the results, including personal satisfaction, career advancement, and prestige. In this session we will discuss how to begin doing research and write a first paper.
READ LESS

Summary

'Publish or perish,' once a term used to refer to the pressure placed on professors to publish their research has since expanded to apply to students and professionals in industry. There are numerous benefits to doing research and publishing the results, including personal satisfaction, career advancement, and prestige. In this...

READ MORE

Multi-modal audio, video and physiological sensor learning for continuous emotion prediction

Summary

The automatic determination of emotional state from multimedia content is an inherently challenging problem with a broad range of applications including biomedical diagnostics, multimedia retrieval, and human computer interfaces. The Audio Video Emotion Challenge (AVEC) 2016 provides a well-defined framework for developing and rigorously evaluating innovative approaches for estimating the arousal and valence states of emotion as a function of time. It presents the opportunity for investigating multimodal solutions that include audio, video, and physiological sensor signals. This paper provides an overview of our AVEC Emotion Challenge system, which uses multi-feature learning and fusion across all available modalities. It includes a number of technical contributions, including the development of novel high- and low-level features for modeling emotion in the audio, video, and physiological channels. Low-level features include modeling arousal in audio with minimal prosodic-based descriptors. High-level features are derived from supervised and unsupervised machine learning approaches based on sparse coding and deep learning. Finally, a state space estimation approach is applied for score fusion that demonstrates the importance of exploiting the time-series nature of the arousal and valence states. The resulting system outperforms the baseline systems [10] on the test evaluation set with an achieved Concordant Correlation Coefficient (CCC) for arousal of 0.770 vs 0.702 (baseline) and for valence of 0.687 vs 0.638. Future work will focus on exploiting the time-varying nature of individual channels in the multi-modal framework.
READ LESS

Summary

The automatic determination of emotional state from multimedia content is an inherently challenging problem with a broad range of applications including biomedical diagnostics, multimedia retrieval, and human computer interfaces. The Audio Video Emotion Challenge (AVEC) 2016 provides a well-defined framework for developing and rigorously evaluating innovative approaches for estimating the...

READ MORE

Detecting depression using vocal, facial and semantic communication cues

Summary

Major depressive disorder (MDD) is known to result in neurophysiological and neurocognitive changes that affect control of motor, linguistic, and cognitive functions. MDD's impact on these processes is reflected in an individual's communication via coupled mechanisms: vocal articulation, facial gesturing and choice of content to convey in a dialogue. In particular, MDD-induced neurophysiological changes are associated with a decline in dynamics and coordination of speech and facial motor control, while neurocognitive changes influence dialogue semantics. In this paper, biomarkers are derived from all of these modalities, drawing first from previously developed neurophysiologically motivated speech and facial coordination and timing features. In addition, a novel indicator of lower vocal tract constriction in articulation is incorporated that relates to vocal projection. Semantic features are analyzed for subject/avatar dialogue content using a sparse coded lexical embedding space, and for contextual clues related to the subject's present or past depression status. The features and depression classification system were developed for the 6th International Audio/Video Emotion Challenge (AVEC), which provides data consisting of audio, video-based facial action units, and transcribed text of individuals communicating with the human-controlled avatar. A clinical Patient Health Questionnaire (PHQ) score and binary depression decision are provided for each participant. PHQ predictions were obtained by fusing outputs from a Gaussian staircase regressor for each feature set, with results on the development set of mean F1=0.81, RMSE=5.31, and MAE=3.34. These compare favorably to the challenge baseline development results of mean F1=0.73, RMSE=6.62, and MAE=5.52. On test set evaluation, our system obtained a mean F1=0.70, which is similar to the challenge baseline test result. Future work calls for consideration of joint feature analyses across modalities in an effort to detect neurological disorders based on the interplay of motor, linguistic, affective, and cognitive components of communication.
READ LESS

Summary

Major depressive disorder (MDD) is known to result in neurophysiological and neurocognitive changes that affect control of motor, linguistic, and cognitive functions. MDD's impact on these processes is reflected in an individual's communication via coupled mechanisms: vocal articulation, facial gesturing and choice of content to convey in a dialogue. In...

READ MORE

How deep neural networks can improve emotion recognition on video data

Published in:
ICIP: 2016 IEEE Int. Conf. on Image Processing, 25-28 September 2016.

Summary

We consider the task of dimensional emotion recognition on video data using deep learning. While several previous methods have shown the benefits of training temporal neural network models such as recurrent neural networks (RNNs) on hand-crafted features, few works have considered combining convolutional neural networks (CNNs) with RNNs. In this work, we present a system that performs emotion recognition on video data using both CNNs and RNNs, and we also analyze how much each neural network component contributes to the system's overall performance. We present our findings on videos from the Audio/Visual+Emotion Challenge (AV+EC2015). In our experiments, we analyze the effects of several hyperparameters on overall performance while also achieving superior performance to the baseline and other competing methods.
READ LESS

Summary

We consider the task of dimensional emotion recognition on video data using deep learning. While several previous methods have shown the benefits of training temporal neural network models such as recurrent neural networks (RNNs) on hand-crafted features, few works have considered combining convolutional neural networks (CNNs) with RNNs. In this...

READ MORE

Sparse-coded net model and applications

Published in:
2016 IEEE Int. Workshop on Machine Learning for Signal Processing, 13-16 September 2016.

Summary

As an unsupervised learning method, sparse coding can discover high-level representations for an input in a large variety of learning problems. Under semi-supervised settings, sparse coding is used to extract features for a supervised task such as classification. While sparse representations learned from unlabeled data independently of the supervised task perform well, we argue that sparse coding should also be built as a holistic learning unit optimizing on the supervised task objectives more explicitly. In this paper, we propose sparse-coded net, a feedforward model that integrates sparse coding and task-driven output layers, and describe training methods in detail. After pretraining a sparse-coded net via semi-supervised learning, we optimize its task-specific performance in a novel backpropagation algorithm that can traverse nonlinear feature pooling operators to update the dictionary. Thus, sparse-coded net can be applied to supervised dictionary learning. We evaluate sparse-coded net with classification problems in sound, image, and text data. The results confirm a significant improvement over semi-supervised learning as well as superior classification performance against deep stacked autoencoder neural network and GMM-SVM pipelines in small to medium-scale settings.
READ LESS

Summary

As an unsupervised learning method, sparse coding can discover high-level representations for an input in a large variety of learning problems. Under semi-supervised settings, sparse coding is used to extract features for a supervised task such as classification. While sparse representations learned from unlabeled data independently of the supervised task...

READ MORE

I-vector speaker and language recognition system on Android

Published in:
HPEC 2016: IEEE Conf. on High Performance Extreme Computing, 13-15 September 2016.

Summary

I-Vector based speaker and language identification provides state of the art performance. However, this comes as a more computationally complex solution, which can often lead to challenges in resource-limited devices, such as phones or tablets. We present the implementation of an I-Vector speaker and language recognition system on the Android platform in the form of a fully functional application that allows speaker enrollment and language/speaker scoring within mobile contexts. We include a detailed account of the challenges to port the system and its dependencies, which were necessary to optimize matrix operations in the I-Vector implementation. The system was benchmarked on a for a Google Nexus 6, showing a speed increase of 61.68% in scoring and 82.63% in enrollment operations with the implemented optimizations. The application was tested in mobile settings on a Nexus 7 tablet with forty participants, showing a rough accuracy of 84%. The optimized platform showed the capacity to perform near real-time recognition within a mobile setting and showcases the viability of I-Vector systems on resource-limited environments.
READ LESS

Summary

I-Vector based speaker and language identification provides state of the art performance. However, this comes as a more computationally complex solution, which can often lead to challenges in resource-limited devices, such as phones or tablets. We present the implementation of an I-Vector speaker and language recognition system on the Android...

READ MORE

Speaker linking and applications using non-parametric hashing methods

Published in:
INTERSPEECH 2016: 16th Annual Conf. of the Int. Speech Communication Assoc., 8-12 September 2016.

Summary

Large unstructured audio data sets have become ubiquitous and present a challenge for organization and search. One logical approach for structuring data is to find common speakers and link occurrences across different recordings. Prior approaches to this problem have focused on basic methodology for the linking task. In this paper, we introduce a novel trainable nonparametric hashing method for indexing large speaker recording data sets. This approach leads to tunable computational complexity methods for speaker linking. We focus on a scalable clustering method based on hashing canopy-clustering. We apply this method to a large corpus of speaker recordings, demonstrate performance tradeoffs, and compare to other hashing methods.
READ LESS

Summary

Large unstructured audio data sets have become ubiquitous and present a challenge for organization and search. One logical approach for structuring data is to find common speakers and link occurrences across different recordings. Prior approaches to this problem have focused on basic methodology for the linking task. In this paper...

READ MORE

Corpora for the evaluation of robust speaker recognition systems

Published in:
INTERSPEECH 2016: 16th Annual Conf. of the Int. Speech Communication Assoc., 8-12 September 2016.

Summary

The goal of this paper is to describe significant corpora available to support speaker recognition research and evaluation, along with details about the corpora collection and design. We describe the attributes of high-quality speaker recognition corpora. Considerations of the application, domain, and performance metrics are also discussed. Additionally, a literature survey of corpora used in speaker recognition research over the last 10 years is presented. Finally we show the most common corpora used in the research community and review them on their success in enabling meaningful speaker recognition research.
READ LESS

Summary

The goal of this paper is to describe significant corpora available to support speaker recognition research and evaluation, along with details about the corpora collection and design. We describe the attributes of high-quality speaker recognition corpora. Considerations of the application, domain, and performance metrics are also discussed. Additionally, a literature...

READ MORE

Relating estimated cyclic spectral peak frequency to measured epilarynx length using magnetic resonance imaging

Published in:
INTERSPEECH 2016: 16th Annual Conf. of the Int. Speech Communication Assoc., 8-12 September 2016.

Summary

The epilarynx plays an important role in speech production, carrying information about the individual speaker and manner of articulation. However, precise acoustic behavior of this lower vocal tract structure is difficult to establish. Focusing on acoustics observable in natural speech, recent spectral processing techniques isolate a unique resonance with characteristics of the epilarynx previously shown via simulation, specifically cyclicity (i.e. energy differences between the closed and open phases of the glottal cycle) in a 3-5kHz region observed across vowels. Using Magnetic Resonance Imaging (MRI), the present work relates this estimated cyclic peak frequency to measured epilarynx length. Assuming a simple quarter wavelength relationship, the cavity length estimated from the cyclic peak frequency is shown to be directly proportional (linear fit slope =1.1) and highly correlated (p = 0.85, pval<10^?4) to the measured epilarynx length across speakers. Results are discussed, as are implications in speech science and application domains.
READ LESS

Summary

The epilarynx plays an important role in speech production, carrying information about the individual speaker and manner of articulation. However, precise acoustic behavior of this lower vocal tract structure is difficult to establish. Focusing on acoustics observable in natural speech, recent spectral processing techniques isolate a unique resonance with characteristics...

READ MORE