Publications

Refine Results

(Filters Applied) Clear All

A fun and engaging interface for crowdsourcing named entities

Published in:
10th Language Resources and Evaluation Conf., LREC 2016, 23-28 May 2016.

Summary

There are many current problems in natural language processing that are best solved by training algorithms on an annotated in-language, in-domain corpus. The more representative the training corpus is of the test data, the better the algorithm will perform, but also the less likely it is that such a corpus has already been annotated. Annotating corpora for natural language processing tasks is typically a time consuming and expensive process. In this paper, we provide a case study in using crowd sourcing to curate an in-domain corpus for named entity recognition, a common problem in natural language processing. In particular, we present our use of fun, engaging user interfaces as a way to entice workers to partake in our crowd sourcing task while avoiding inflating our payments in a way that would attract more mercenary workers than conscientious ones. Additionally, we provide a survey of alternate interfaces for collecting annotations of named entities and compare our approach to those systems.
READ LESS

Summary

There are many current problems in natural language processing that are best solved by training algorithms on an annotated in-language, in-domain corpus. The more representative the training corpus is of the test data, the better the algorithm will perform, but also the less likely it is that such a corpus...

READ MORE

Generating a multiple-prerequisite attack graph

Feedback-based social media filtering tool for improved situational awareness

Published in:
15th Annual IEEE Int. Symp. on Technologies for Homeland Security, HST 2016, 10-12 May 2016.

Summary

This paper describes a feature-rich model of data relevance, designed to aid first responder retrieval of useful information from social media sources during disasters or emergencies. The approach is meant to address the failure of traditional keyword-based methods to sufficiently suppress clutter during retrieval. The model iteratively incorporates relevance feedback to update feature space selection and classifier construction across a multimodal set of diverse content characterization techniques. This approach is advantageous because the aspects of the data (or even the modalities of the data) that signify relevance cannot always be anticipated ahead of time. Experiments with both microblog text documents and coupled imagery and text documents demonstrate the effectiveness of this model on sample retrieval tasks, in comparison to more narrowly focused models operating in a priori selected feature spaces. The experiments also show that even relatively low feedback levels (i.e., tens of examples) can lead to a significant performance boost during the interactive retrieval process.
READ LESS

Summary

This paper describes a feature-rich model of data relevance, designed to aid first responder retrieval of useful information from social media sources during disasters or emergencies. The approach is meant to address the failure of traditional keyword-based methods to sufficiently suppress clutter during retrieval. The model iteratively incorporates relevance feedback...

READ MORE

A reverse approach to named entity extraction and linking in microposts

Published in:
Proc. of the 6th Workshop on "Making Sense of Microposts" (part of: 25th Int. World Wide Web Conf., 11 April 2016), #Microposts2016, pp. 67-69.

Summary

In this paper, we present a pipeline for named entity extraction and linking that is designed specifically for noisy, grammatically inconsistent domains where traditional named entity techniques perform poorly. Our approach leverages a large knowledge base to improve entity recognition, while maintaining the use of traditional NER to identify mentions that are not co-referent with any entities in the knowledge base.
READ LESS

Summary

In this paper, we present a pipeline for named entity extraction and linking that is designed specifically for noisy, grammatically inconsistent domains where traditional named entity techniques perform poorly. Our approach leverages a large knowledge base to improve entity recognition, while maintaining the use of traditional NER to identify mentions...

READ MORE

Named entity recognition in 140 characters or less

Published in:
Proc. of the 6th Workshop on "Making Sense of Microposts" (part of: 25th Int. World Wide Web Conf., 11 April 2016), #Microposts2016, pp. 78-79.

Summary

In this paper, we explore the problem of recognizing named entities in microposts, a genre with notoriously little context surrounding each named entity and inconsistent use of grammar, punctuation, capitalization, and spelling conventions by authors. In spite of the challenges associated with information extraction from microposts, it remains an increasingly important genre. This paper presents the MIT Information Extraction Toolkit (MITIE) and explores its adaptability to the micropost genre.
READ LESS

Summary

In this paper, we explore the problem of recognizing named entities in microposts, a genre with notoriously little context surrounding each named entity and inconsistent use of grammar, punctuation, capitalization, and spelling conventions by authors. In spite of the challenges associated with information extraction from microposts, it remains an increasingly...

READ MORE

Blind signal classification via sparse coding

Published in:
IEEE Int. Conf. Computer Communications, IEEE INFOCOM 2016, 10-15 April 2016.

Summary

We propose a novel RF signal classification method based on sparse coding, an unsupervised learning method popular in computer vision. In particular, we employ a convolutional sparse coder that can extract high-level features by computing the maximal similarity between an unknown received signal against an overcomplete dictionary of matched filter templates. Such dictionary can be either generated or trained in an unsupervised fashion from signal examples labeled with no ground truths. The computed sparse code then is applied to train SVM classifiers to discriminate RF signals. As a result, the proposed approach can achieve blind signal classification that requires no prior knowledge (e.g., MCS, pulse shaping) about the signals present in an arbitrary RF channel. Since modulated RF signals undergo pulse shaping to aid the matched filter detection by a receiver for the same radio protocol, our method can exploit variability in relative similarity against the dictionary atoms as the key discriminating factor for SVM. We present an empirical validation of our approach. The results indicate that we can separate different classes of digitally modulated signals from blind sampling with 70.3% recall and 24.6% false alarm at 10 dB SNR. If a labeled dataset were available for supervised classifier training, we can enhance the classification accuracy to 87.8% recall and 14.1% false alarm.
READ LESS

Summary

We propose a novel RF signal classification method based on sparse coding, an unsupervised learning method popular in computer vision. In particular, we employ a convolutional sparse coder that can extract high-level features by computing the maximal similarity between an unknown received signal against an overcomplete dictionary of matched filter...

READ MORE

Competing cognitive resilient networks

Published in:
IEEE Trans. Cognit. Commun. and Netw., Vol. 2, No. 1, March 2016, pp. 95-109.

Summary

We introduce competing cognitive resilient network (CCRN) of mobile radios challenged to optimize data throughput and networking efficiency under dynamic spectrum access and adversarial threats (e.g., jamming). Unlike the conventional approaches, CCRN features both communicator and jamming nodes in a friendly coalition to take joint actions against hostile networking entities. In particular, this paper showcases hypothetical blue force and red force CCRNs and their competition for open spectrum resources. We present state-agnostic and stateful solution approaches based on the decision theoretic framework. The state-agnostic approach builds on multiarmed bandit to develop an optimal strategy that enables the exploratory-exploitative actions from sequential sampling of channel rewards. The stateful approach makes an explicit model of states and actions from an underlying Markov decision process and uses multiagent Q-learning to compute optimal node actions. We provide a theoretical framework for CCRN and propose new algorithms for both approaches. Simulation results indicate that the proposed algorithms outperform some of the most important algorithms known to date.
READ LESS

Summary

We introduce competing cognitive resilient network (CCRN) of mobile radios challenged to optimize data throughput and networking efficiency under dynamic spectrum access and adversarial threats (e.g., jamming). Unlike the conventional approaches, CCRN features both communicator and jamming nodes in a friendly coalition to take joint actions against hostile networking entities...

READ MORE

Recommender systems for the Department of Defense and intelligence community

Summary

Recommender systems, which selectively filter information for users, can hasten analysts' responses to complex events such as cyber attacks. Lincoln Laboratory's research on recommender systems may bring the capabilities of these systems to analysts in both the Department of Defense and intelligence community.
READ LESS

Summary

Recommender systems, which selectively filter information for users, can hasten analysts' responses to complex events such as cyber attacks. Lincoln Laboratory's research on recommender systems may bring the capabilities of these systems to analysts in both the Department of Defense and intelligence community.

READ MORE

Finding malicious cyber discussions in social media

Summary

Today's analysts manually examine social media networks to find discussions concerning planned cyber attacks, attacker techniques and tools, and potential victims. Applying modern machine learning approaches, Lincoln Laboratory has demonstrated the ability to automatically discover such discussions from Stack Exchange, Reddit, and Twitter posts written in English.
READ LESS

Summary

Today's analysts manually examine social media networks to find discussions concerning planned cyber attacks, attacker techniques and tools, and potential victims. Applying modern machine learning approaches, Lincoln Laboratory has demonstrated the ability to automatically discover such discussions from Stack Exchange, Reddit, and Twitter posts written in English.

READ MORE

Analysis of factors affecting system performance in the ASpIRE challenge

Published in:
2015 IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2015, 13-17 December 2015.

Summary

This paper presents an analysis of factors affecting system performance in the ASpIRE (Automatic Speech recognition In Reverberant Environments) challenge. In particular, overall word error rate (WER) of the solver systems is analyzed as a function of room, distance between talker and microphone, and microphone type. We also analyze speech activity detection performance of the solver systems and investigate its relationship to WER. The primary goal of the paper is to provide insight into the factors affecting system performance in the ASpIRE evaluation set across many systems given annotations and metadata that are not available to the solvers. This analysis will inform the design of future challenges and provide insight into the efficacy of current solutions addressing noisy reverberant speech in mismatched conditions.
READ LESS

Summary

This paper presents an analysis of factors affecting system performance in the ASpIRE (Automatic Speech recognition In Reverberant Environments) challenge. In particular, overall word error rate (WER) of the solver systems is analyzed as a function of room, distance between talker and microphone, and microphone type. We also analyze speech...

READ MORE