Publications

Refine Results

(Filters Applied) Clear All

R&D Areas

R&D Groups

Year

Items per page

By

Joseph P. Campbell Jr Clear filter

Combating Misinformation: HLT Highlights from MIT Lincoln Laboratory

March 17, 2021

Presentation

Author:

Joseph P. Campbell Jr

Published in:

Human Language Technology Conference (HLTCon), 16-18 March 2021.

R&D area:

Cyber Security and Information Sciences

R&D group:

Artificial Intelligence Technology and Systems

Summary

Dr. Joseph Campbell shares several human language technologies highlights from MIT Lincoln Laboratory. These include key enabling technologies in combating misinformation to link personas, analyze content, and understand human networks. Developing operationally relevant technologies requires access to corresponding data with meaningful evaluations, as Dr. Douglas Reynolds presented in his keynote. As Dr. Danelle Shah discussed in her keynote, it’s crucial to develop these technologies to operate at deeper levels than the surface. Producing reliable information from the fusion of missing and inherently unreliable information channels is paramount. Furthermore, the dynamic misinformation environment and the coevolution of allied methods with adversarial methods represent additional challenges

READ LESS

Summary

Combating Misinformation: HLT Highlights from MIT Lincoln Laboratory

Making #sense of #unstructured text data

December 5, 2016

Conference Paper

Author:

Lin Li

…

Published in:

30th Conf. on Neural Info. Processing Syst., NIPS 2016, 5-10 December 2016.

Topic:

artificial intelligence

R&D area:

Cyber Security and Information Sciences

R&D group:

Artificial Intelligence Technology and Systems

Summary

Automatic extraction of intelligent and useful information from data is one of the main goals in data science. Traditional approaches have focused on learning from structured features, i.e., information in a relational database. However, most of the data encountered in practice are unstructured (i.e., social media posts, forums, emails and web logs); they do not have a predefined schema or format. In this work, we examine unsupervised methods for processing unstructured text data, extracting relevant information, and transforming it into structured information that can then be leveraged in various applications such as graph analysis and matching entities across different platforms. Various efforts have been proposed to develop algorithms for processing unstructured text data. At a top level, text can be either summarized by document level features (i.e., language, topic, genre, etc.) or analyzed at a word or sub-word level. Text analytics can be unsupervised, semi-supervised, or supervised. In this work, we focus on word analysis and unsupervised methods. Unsupervised (or semi-supervised) methods require less human annotation and can easily fulfill the role of automatic analysis. For text analysis, we focus on methods for finding relevant words in the text. Specifically, we look at social media data and attempt to predict hashtags for users' posts. The resulting hashtags can be used for downstream processing such as graph analysis. Automatic hashtag annotation is closely related to automatic tag extraction and keyword extraction. Techniques for hashtags extraction include topic analysis, supervised classifiers, machine translation methods, and collaborative filtering. Methods for keyword extraction include graph-based and topical analysis of text.

READ LESS

Summary

Making #sense of #unstructured text data

LLTools: machine learning for human language processing

December 5, 2016

Conference Paper

Author:

Cagri K. Dagli

…

Published in:

30th Conf. on Neural Info. Processing Syst., NIPS 2016, 5-10 December 2016.

Topic:

big data

R&D area:

Cyber Security and Information Sciences

R&D group:

Summary

Machine learning methods in Human Language Technology have reached a stage of maturity where widespread use is both possible and desirable. The MIT Lincoln Laboratory LLTools software suite provides a step towards this goal by providing a set of easily accessible frameworks for incorporating speech, text, and entity resolution components into larger applications. For the speech processing component, the pySLGR (Speaker, Language, Gender Recognition) tool provides signal processing, standard feature analysis, speech utterance embedding, and machine learning modeling methods in Python. The text processing component in LLTools extracts semantically meaningful insights from unstructured data via entity extraction, topic modeling, and document classification. The entity resolution component in LLTools provides approximate string matching, author recognition and graph-based methods for identifying and linking different instances of the same real-world entity. We show through two applications that LLTools can be used to rapidly create and train research prototypes for human language processing.

READ LESS

Summary

LLTools: machine learning for human language processing

An overview of the DARPA Data Driven Discovery of Models (D3M) Program

December 5, 2016

Conference Paper

Author:

Richard P. Lippmann

…

Published in:

29th Conf. on Neural Information Processing Systems, NIPS, 5-10 December 2016.

R&D area:

Cyber Security and Information Sciences

R&D group:

Artificial Intelligence Technology and Systems

Summary

A new DARPA program called Data Driven Discovery of Models (D3M) aims to develop automated model discovery systems that can be used by researchers with specific subject matter expertise to create empirical models of real, complex processes. Two major goals of this program are to allow experts to create empirical models without the need for data scientists and to increase the productivity of data scientists via automation. Automated model discovery systems developed will be tested on real-world problems that progressively get harder during the course of the program. Toward the end of the program, problems will be both unsolved and underspecified in terms of data and desired outcomes. The program will emphasize creating and leveraging open source technology and architecture. Our presentation reviews the goals and structure of this program which will begin early in 2017. Although the deadline for submitting proposals has past, we welcome suggestions concerning challenge tasks, evaluations, or new open-source data sets to be included for system development and evaluation that would supplement data currently being curated from many sources.

READ LESS

Summary

An overview of the DARPA Data Driven Discovery of Models (D3M) Program

Predicting and analyzing factors in patent litigation

December 5, 2016

Conference Paper

Author:

William M. Campbell

…

Published in:

30th Conf. on Neural Information Processing System, NIPS 2016, 5-10 December 2016.

Topic:

artificial intelligence

R&D area:

Cyber Security and Information Sciences

R&D group:

Artificial Intelligence Technology and Systems

Summary

Patent litigation is an expensive and time-consuming process. To minimize its impact on the participants in the patent lifecycle, automatic determination of litigation potential is a compelling machine learning application. In this paper, we consider preliminary methods for the prediction of a patent being involved in litigation using metadata, content, and graph features. Metadata features are top-level easily-extractable features, i.e., assignee, number of claims, etc. The content feature performs lexical analysis of the claims associated to a patent. Graph features use relational learning to summarize patent references. We apply our methods on US patents using a labeled data set. Prior work has focused on metadata-only features, but we show that both graph and content features have significant predictive capability. Additionally, fusing all features results in improved performance. We also perform a preliminary examination of some of the qualitative factors that may have significant importance in patent litigation.

READ LESS

Summary

Predicting and analyzing factors in patent litigation

Corpora for the evaluation of robust speaker recognition systems

September 8, 2016

Conference Paper

Author:

Douglas E. Sturim

…

Published in:

INTERSPEECH 2016: 16th Annual Conf. of the Int. Speech Communication Assoc., 8-12 September 2016.

Topic:

speaker recognition

R&D area:

Cyber Security and Information Sciences

R&D group:

Artificial Intelligence Technology and Systems

Summary

The goal of this paper is to describe significant corpora available to support speaker recognition research and evaluation, along with details about the corpora collection and design. We describe the attributes of high-quality speaker recognition corpora. Considerations of the application, domain, and performance metrics are also discussed. Additionally, a literature survey of corpora used in speaker recognition research over the last 10 years is presented. Finally we show the most common corpora used in the research community and review them on their success in enabling meaningful speaker recognition research.

READ LESS

Summary

Corpora for the evaluation of robust speaker recognition systems

Cross-domain entity resolution in social media

July 11, 2016

Conference Paper

Author:

William M. Campbell

…

Published in:

4th Int. Workshop on Natural Language Processing for Social Media, SocialNLP with IJCAI, 11 July 2016.

Topic:

social network

R&D area:

Cyber Security and Information Sciences

R&D group:

Summary

The challenge of associating entities across multiple domains is a key problem in social media understanding. Successful cross-domain entity resolution provides integration of information from multiple sites to create a complete picture of user and community activities, characteristics, and trends. In this work, we examine the problem of entity resolution across Twitter and Instagram using general techniques. Our methods fall into three categories: profile, content, and graph based. For the profile-based methods, we consider techniques based on approximate string matching. For content-based methods, we perform author identification. Finally, for graph-based methods, we apply novel cross-domain community detection methods and generate neighborhood-based features. The three categories of methods are applied to a large graph of users in Twitter and Instagram to understand challenges, determine performance, and understand fusion of multiple methods. Final results demonstrate an equal error rate less than 1%.

READ LESS

Summary

Cross-domain entity resolution in social media

A fun and engaging interface for crowdsourcing named entities

May 23, 2016

Conference Paper

Author:

Kara B. Greenfield

…

Published in:

10th Language Resources and Evaluation Conf., LREC 2016, 23-28 May 2016.

Topic:

human language technology

R&D area:

Cyber Security and Information Sciences

R&D group:

Artificial Intelligence Technology and Systems

Summary

There are many current problems in natural language processing that are best solved by training algorithms on an annotated in-language, in-domain corpus. The more representative the training corpus is of the test data, the better the algorithm will perform, but also the less likely it is that such a corpus has already been annotated. Annotating corpora for natural language processing tasks is typically a time consuming and expensive process. In this paper, we provide a case study in using crowd sourcing to curate an in-domain corpus for named entity recognition, a common problem in natural language processing. In particular, we present our use of fun, engaging user interfaces as a way to entice workers to partake in our crowd sourcing task while avoiding inflating our payments in a way that would attract more mercenary workers than conscientious ones. Additionally, we provide a survey of alternate interfaces for collecting annotations of named entities and compare our approach to those systems.

READ LESS

Summary

A fun and engaging interface for crowdsourcing named entities

Recommender systems for the Department of Defense and intelligence community

January 1, 2016

Journal Article

Author:

Vijay N. Gadepally

…

Published in:

Lincoln Laboratory Journal, Vol. 22, No. 1, 2016, pp. 74-89.

Topic:

supercomputing

R&D area:

Cyber Security and Information Sciences

R&D group:

Lincoln Laboratory Supercomputing Center

Summary

Recommender systems, which selectively filter information for users, can hasten analysts' responses to complex events such as cyber attacks. Lincoln Laboratory's research on recommender systems may bring the capabilities of these systems to analysts in both the Department of Defense and intelligence community.

READ LESS

Summary

Recommender systems for the Department of Defense and intelligence community

January 1, 2016

Journal Article

Author:

Vijay N. Gadepally

…

Published in:

Lincoln Laboratory Journal, Vol. 22, No. 1, 2016, pp. 74-89.

Topic:

machine learning

R&D area:

Cyber Security and Information Sciences

R&D group:

Artificial Intelligence Technology and Systems

Summary

READ LESS

Summary

Recommender systems for the Department of Defense and intelligence community

Publications

Refine Results

By

Summary

Summary

Summary

Summary

Summary

Summary

Summary

Summary

Predicting and analyzing factors in patent litigation

Summary

Summary

Summary

Summary

Summary

Summary

A fun and engaging interface for crowdsourcing named entities

Summary

Summary

Summary

Summary

Summary

Summary

Showing Results