Publications

Refine Results

(Filters Applied) Clear All

Predicting and analyzing factors in patent litigation

Published in:
30th Conf. on Neural Information Processing System, NIPS 2016, 5-10 December 2016.

Summary

Patent litigation is an expensive and time-consuming process. To minimize its impact on the participants in the patent lifecycle, automatic determination of litigation potential is a compelling machine learning application. In this paper, we consider preliminary methods for the prediction of a patent being involved in litigation using metadata, content, and graph features. Metadata features are top-level easily-extractable features, i.e., assignee, number of claims, etc. The content feature performs lexical analysis of the claims associated to a patent. Graph features use relational learning to summarize patent references. We apply our methods on US patents using a labeled data set. Prior work has focused on metadata-only features, but we show that both graph and content features have significant predictive capability. Additionally, fusing all features results in improved performance. We also perform a preliminary examination of some of the qualitative factors that may have significant importance in patent litigation.
READ LESS

Summary

Patent litigation is an expensive and time-consuming process. To minimize its impact on the participants in the patent lifecycle, automatic determination of litigation potential is a compelling machine learning application. In this paper, we consider preliminary methods for the prediction of a patent being involved in litigation using metadata, content...

READ MORE

Making #sense of #unstructured text data

Published in:
30th Conf. on Neural Info. Processing Syst., NIPS 2016, 5-10 December 2016.

Summary

Automatic extraction of intelligent and useful information from data is one of the main goals in data science. Traditional approaches have focused on learning from structured features, i.e., information in a relational database. However, most of the data encountered in practice are unstructured (i.e., social media posts, forums, emails and web logs); they do not have a predefined schema or format. In this work, we examine unsupervised methods for processing unstructured text data, extracting relevant information, and transforming it into structured information that can then be leveraged in various applications such as graph analysis and matching entities across different platforms. Various efforts have been proposed to develop algorithms for processing unstructured text data. At a top level, text can be either summarized by document level features (i.e., language, topic, genre, etc.) or analyzed at a word or sub-word level. Text analytics can be unsupervised, semi-supervised, or supervised. In this work, we focus on word analysis and unsupervised methods. Unsupervised (or semi-supervised) methods require less human annotation and can easily fulfill the role of automatic analysis. For text analysis, we focus on methods for finding relevant words in the text. Specifically, we look at social media data and attempt to predict hashtags for users' posts. The resulting hashtags can be used for downstream processing such as graph analysis. Automatic hashtag annotation is closely related to automatic tag extraction and keyword extraction. Techniques for hashtags extraction include topic analysis, supervised classifiers, machine translation methods, and collaborative filtering. Methods for keyword extraction include graph-based and topical analysis of text.
READ LESS

Summary

Automatic extraction of intelligent and useful information from data is one of the main goals in data science. Traditional approaches have focused on learning from structured features, i.e., information in a relational database. However, most of the data encountered in practice are unstructured (i.e., social media posts, forums, emails and...

READ MORE

An overview of the DARPA Data Driven Discovery of Models (D3M) Program

Published in:
29th Conf. on Neural Information Processing Systems, NIPS, 5-10 December 2016.

Summary

A new DARPA program called Data Driven Discovery of Models (D3M) aims to develop automated model discovery systems that can be used by researchers with specific subject matter expertise to create empirical models of real, complex processes. Two major goals of this program are to allow experts to create empirical models without the need for data scientists and to increase the productivity of data scientists via automation. Automated model discovery systems developed will be tested on real-world problems that progressively get harder during the course of the program. Toward the end of the program, problems will be both unsolved and underspecified in terms of data and desired outcomes. The program will emphasize creating and leveraging open source technology and architecture. Our presentation reviews the goals and structure of this program which will begin early in 2017. Although the deadline for submitting proposals has past, we welcome suggestions concerning challenge tasks, evaluations, or new open-source data sets to be included for system development and evaluation that would supplement data currently being curated from many sources.
READ LESS

Summary

A new DARPA program called Data Driven Discovery of Models (D3M) aims to develop automated model discovery systems that can be used by researchers with specific subject matter expertise to create empirical models of real, complex processes. Two major goals of this program are to allow experts to create empirical...

READ MORE

Leveraging data provenance to enhance cyber resilience

Summary

Building secure systems used to mean ensuring a secure perimeter, but that is no longer the case. Today's systems are ill-equipped to deal with attackers that are able to pierce perimeter defenses. Data provenance is a critical technology in building resilient systems that will allow systems to recover from attackers that manage to overcome the "hard-shell" defenses. In this paper, we provide background information on data provenance, details on provenance collection, analysis, and storage techniques and challenges. Data provenance is situated to address the challenging problem of allowing a system to "fight-through" an attack, and we help to identify necessary work to ensure that future systems are resilient.
READ LESS

Summary

Building secure systems used to mean ensuring a secure perimeter, but that is no longer the case. Today's systems are ill-equipped to deal with attackers that are able to pierce perimeter defenses. Data provenance is a critical technology in building resilient systems that will allow systems to recover from attackers...

READ MORE

Building low-power trustworthy systems: cyber-security considerations for real-time physiological status monitoring

Summary

Real-time monitoring of physiological data can reduce the likelihood of injury in noncombat military personnel and first-responders. MIT Lincoln Laboratory is developing a tactical Real-Time Physiological Status Monitoring (RT-PSM) system architecture and reference implementation named OBAN (Open Body Area Network), the purpose of which is to provide an open, government-owned framework for integrating multiple wearable sensors and applications. The OBAN implementation accepts data from various sensors enabling calculation of physiological strain information which may be used by squad leaders or medics to assess the team's health and enhance safety and effectiveness of mission execution. Security in terms of measurement integrity, confidentiality, and authenticity is an area of interest because OBAN system components exchange sensitive data in contested environments. In this paper, we analyze potential cyber-security threats and their associated risks to a generalized version of the OBAN architecture and identify directions for future research. The threat analysis is intended to inform the development of secure RT-PSM architectures and implementations.
READ LESS

Summary

Real-time monitoring of physiological data can reduce the likelihood of injury in noncombat military personnel and first-responders. MIT Lincoln Laboratory is developing a tactical Real-Time Physiological Status Monitoring (RT-PSM) system architecture and reference implementation named OBAN (Open Body Area Network), the purpose of which is to provide an open, government-owned...

READ MORE

Writing your first paper: from code to research

Published in:
Grace Hopper Celebration of Women in Computing, 19-21 October 2016.

Summary

'Publish or perish,' once a term used to refer to the pressure placed on professors to publish their research has since expanded to apply to students and professionals in industry. There are numerous benefits to doing research and publishing the results, including personal satisfaction, career advancement, and prestige. In this session we will discuss how to begin doing research and write a first paper.
READ LESS

Summary

'Publish or perish,' once a term used to refer to the pressure placed on professors to publish their research has since expanded to apply to students and professionals in industry. There are numerous benefits to doing research and publishing the results, including personal satisfaction, career advancement, and prestige. In this...

READ MORE

Multi-modal audio, video and physiological sensor learning for continuous emotion prediction

Summary

The automatic determination of emotional state from multimedia content is an inherently challenging problem with a broad range of applications including biomedical diagnostics, multimedia retrieval, and human computer interfaces. The Audio Video Emotion Challenge (AVEC) 2016 provides a well-defined framework for developing and rigorously evaluating innovative approaches for estimating the arousal and valence states of emotion as a function of time. It presents the opportunity for investigating multimodal solutions that include audio, video, and physiological sensor signals. This paper provides an overview of our AVEC Emotion Challenge system, which uses multi-feature learning and fusion across all available modalities. It includes a number of technical contributions, including the development of novel high- and low-level features for modeling emotion in the audio, video, and physiological channels. Low-level features include modeling arousal in audio with minimal prosodic-based descriptors. High-level features are derived from supervised and unsupervised machine learning approaches based on sparse coding and deep learning. Finally, a state space estimation approach is applied for score fusion that demonstrates the importance of exploiting the time-series nature of the arousal and valence states. The resulting system outperforms the baseline systems [10] on the test evaluation set with an achieved Concordant Correlation Coefficient (CCC) for arousal of 0.770 vs 0.702 (baseline) and for valence of 0.687 vs 0.638. Future work will focus on exploiting the time-varying nature of individual channels in the multi-modal framework.
READ LESS

Summary

The automatic determination of emotional state from multimedia content is an inherently challenging problem with a broad range of applications including biomedical diagnostics, multimedia retrieval, and human computer interfaces. The Audio Video Emotion Challenge (AVEC) 2016 provides a well-defined framework for developing and rigorously evaluating innovative approaches for estimating the...

READ MORE

Detecting depression using vocal, facial and semantic communication cues

Summary

Major depressive disorder (MDD) is known to result in neurophysiological and neurocognitive changes that affect control of motor, linguistic, and cognitive functions. MDD's impact on these processes is reflected in an individual's communication via coupled mechanisms: vocal articulation, facial gesturing and choice of content to convey in a dialogue. In particular, MDD-induced neurophysiological changes are associated with a decline in dynamics and coordination of speech and facial motor control, while neurocognitive changes influence dialogue semantics. In this paper, biomarkers are derived from all of these modalities, drawing first from previously developed neurophysiologically motivated speech and facial coordination and timing features. In addition, a novel indicator of lower vocal tract constriction in articulation is incorporated that relates to vocal projection. Semantic features are analyzed for subject/avatar dialogue content using a sparse coded lexical embedding space, and for contextual clues related to the subject's present or past depression status. The features and depression classification system were developed for the 6th International Audio/Video Emotion Challenge (AVEC), which provides data consisting of audio, video-based facial action units, and transcribed text of individuals communicating with the human-controlled avatar. A clinical Patient Health Questionnaire (PHQ) score and binary depression decision are provided for each participant. PHQ predictions were obtained by fusing outputs from a Gaussian staircase regressor for each feature set, with results on the development set of mean F1=0.81, RMSE=5.31, and MAE=3.34. These compare favorably to the challenge baseline development results of mean F1=0.73, RMSE=6.62, and MAE=5.52. On test set evaluation, our system obtained a mean F1=0.70, which is similar to the challenge baseline test result. Future work calls for consideration of joint feature analyses across modalities in an effort to detect neurological disorders based on the interplay of motor, linguistic, affective, and cognitive components of communication.
READ LESS

Summary

Major depressive disorder (MDD) is known to result in neurophysiological and neurocognitive changes that affect control of motor, linguistic, and cognitive functions. MDD's impact on these processes is reflected in an individual's communication via coupled mechanisms: vocal articulation, facial gesturing and choice of content to convey in a dialogue. In...

READ MORE

Side channel authenticity discriminant analysis for device class identification

Summary

Counterfeit microelectronics present a significant challenge to commercial and defense supply chains. Many modern anti-counterfeit strategies rely on manufacturer cooperation to include additional identification components. We instead propose Side Channel Authenticity Discriminant Analysis (SICADA) to leverage physical phenomena manifesting from device operation to match suspect parts to a class of authentic parts. This paper examines the extent that power dissipation information can be used to separate unique classes of devices. A methodology for distinguishing device types is presented and tested on both simulation data of a custom circuit and empirical measurements of Microchip dsPIC33F microcontrollers. Experimental results show that power side channels contain significant distinguishing information to identify parts as authentic or suspect counterfeit.
READ LESS

Summary

Counterfeit microelectronics present a significant challenge to commercial and defense supply chains. Many modern anti-counterfeit strategies rely on manufacturer cooperation to include additional identification components. We instead propose Side Channel Authenticity Discriminant Analysis (SICADA) to leverage physical phenomena manifesting from device operation to match suspect parts to a class of...

READ MORE

How deep neural networks can improve emotion recognition on video data

Published in:
ICIP: 2016 IEEE Int. Conf. on Image Processing, 25-28 September 2016.

Summary

We consider the task of dimensional emotion recognition on video data using deep learning. While several previous methods have shown the benefits of training temporal neural network models such as recurrent neural networks (RNNs) on hand-crafted features, few works have considered combining convolutional neural networks (CNNs) with RNNs. In this work, we present a system that performs emotion recognition on video data using both CNNs and RNNs, and we also analyze how much each neural network component contributes to the system's overall performance. We present our findings on videos from the Audio/Visual+Emotion Challenge (AV+EC2015). In our experiments, we analyze the effects of several hyperparameters on overall performance while also achieving superior performance to the baseline and other competing methods.
READ LESS

Summary

We consider the task of dimensional emotion recognition on video data using deep learning. While several previous methods have shown the benefits of training temporal neural network models such as recurrent neural networks (RNNs) on hand-crafted features, few works have considered combining convolutional neural networks (CNNs) with RNNs. In this...

READ MORE