Publications

Refine Results

(Filters Applied) Clear All

R&D Areas

R&D Groups

Year

Items per page

Causal inference under network interference: a framework for experiments on social networks

January 1, 2017

Thesis

Author:

Edward K. Kao

Published in:

Thesis (Ph.D.)--Harvard University, 2017.

Topic:

networking

R&D area:

Homeland Protection

R&D group:

Homeland Decision Support Systems

Summary

No man is an island, as individuals interact and influence one another daily in our society. When social influence takes place in experiments on a population of interconnected individuals, the treatment on a unit may affect the outcomes of other units, a phenomenon known as interference. This thesis develops a causal framework and inference methodology for experiments where interference takes place on a network of influence (i.e. network interference). In this framework, the network potential outcomes serve as the key quantity and flexible building blocks for causal estimands that represent a variety of primary, peer, and total treatment effects. These causal estimands are estimated via principled Bayesian imputation of missing outcomes. The theory on the unconfoundedness assumptions leading to simplified imputation highlights the importance of including relevant network covariates in the potential outcome model. Additionally, experimental designs that result in balanced covariates and sizes across treatment exposure groups further improve the causal estimate, especially by mitigating potential outcome model mis-specification. The true potential outcome model is not typically known in real-world experiments, so the best practice is to account for interference and confounding network covariates through both balanced designs and model-based imputation. A full factorial simulated experiment is formulated to demonstrate this principle by comparing performance across different randomization schemes during the design phase and estimators during the analysis phase, under varying network topology and true potential outcome models. Overall, this thesis asserts that interference is not just a nuisance for analysis but rather an opportunity for quantifying and leveraging peer effects in real-world experiments.

READ LESS

Summary

Causal inference under network interference: a framework for experiments on social networks

Intersection and convex combination in multi-source spectral planted cluster detection

December 7, 2016

Conference Paper

Author:

Benjamin A. Miller

…

Published in:

IEEE Global Conf. on Signal and Information Processing, GlobalSIP, 7-9 December 2016.

Topic:

signal processing

R&D area:

R&D group:

Homeland Decision Support Systems

Summary

Planted cluster detection is an important form of signal detection when the data are in the form of a graph. When there are multiple graphs representing multiple connection types, the method of aggregation can have significant impact on the results of a detection algorithm. This paper addresses the tradeoff between two possible aggregation methods: convex combination and intersection. For a spectral detection method, convex combination dominates when the cluster is relatively sparse in at least one graph, while the intersection method dominates in cases where it is dense across graphs. Experimental results confirm the theory. We consider the context of adversarial cluster placement, and determine how an adversary would distribute connections among the graphs to best avoid detection.

READ LESS

Summary

Intersection and convex combination in multi-source spectral planted cluster detection

LLTools: machine learning for human language processing

December 5, 2016

Conference Paper

Author:

Cagri K. Dagli

…

Published in:

30th Conf. on Neural Info. Processing Syst., NIPS 2016, 5-10 December 2016.

Topic:

big data

R&D area:

Cyber Security and Information Sciences

R&D group:

Summary

Machine learning methods in Human Language Technology have reached a stage of maturity where widespread use is both possible and desirable. The MIT Lincoln Laboratory LLTools software suite provides a step towards this goal by providing a set of easily accessible frameworks for incorporating speech, text, and entity resolution components into larger applications. For the speech processing component, the pySLGR (Speaker, Language, Gender Recognition) tool provides signal processing, standard feature analysis, speech utterance embedding, and machine learning modeling methods in Python. The text processing component in LLTools extracts semantically meaningful insights from unstructured data via entity extraction, topic modeling, and document classification. The entity resolution component in LLTools provides approximate string matching, author recognition and graph-based methods for identifying and linking different instances of the same real-world entity. We show through two applications that LLTools can be used to rapidly create and train research prototypes for human language processing.

READ LESS

Summary

LLTools: machine learning for human language processing

Sparse-coded net model and applications

September 13, 2016

Conference Paper

Author:

Youngjune L. Gwon

…

Published in:

2016 IEEE Int. Workshop on Machine Learning for Signal Processing, 13-16 September 2016.

Topic:

artificial intelligence

R&D area:

Cyber Security and Information Sciences

R&D group:

Summary

As an unsupervised learning method, sparse coding can discover high-level representations for an input in a large variety of learning problems. Under semi-supervised settings, sparse coding is used to extract features for a supervised task such as classification. While sparse representations learned from unlabeled data independently of the supervised task perform well, we argue that sparse coding should also be built as a holistic learning unit optimizing on the supervised task objectives more explicitly. In this paper, we propose sparse-coded net, a feedforward model that integrates sparse coding and task-driven output layers, and describe training methods in detail. After pretraining a sparse-coded net via semi-supervised learning, we optimize its task-specific performance in a novel backpropagation algorithm that can traverse nonlinear feature pooling operators to update the dictionary. Thus, sparse-coded net can be applied to supervised dictionary learning. We evaluate sparse-coded net with classification problems in sound, image, and text data. The results confirm a significant improvement over semi-supervised learning as well as superior classification performance against deep stacked autoencoder neural network and GMM-SVM pipelines in small to medium-scale settings.

READ LESS

Summary

Sparse-coded net model and applications

Cross-domain entity resolution in social media

July 11, 2016

Conference Paper

Author:

William M. Campbell

…

Published in:

4th Int. Workshop on Natural Language Processing for Social Media, SocialNLP with IJCAI, 11 July 2016.

Topic:

social network

R&D area:

Cyber Security and Information Sciences

R&D group:

Summary

The challenge of associating entities across multiple domains is a key problem in social media understanding. Successful cross-domain entity resolution provides integration of information from multiple sites to create a complete picture of user and community activities, characteristics, and trends. In this work, we examine the problem of entity resolution across Twitter and Instagram using general techniques. Our methods fall into three categories: profile, content, and graph based. For the profile-based methods, we consider techniques based on approximate string matching. For content-based methods, we perform author identification. Finally, for graph-based methods, we apply novel cross-domain community detection methods and generate neighborhood-based features. The three categories of methods are applied to a large graph of users in Twitter and Instagram to understand challenges, determine performance, and understand fusion of multiple methods. Final results demonstrate an equal error rate less than 1%.

READ LESS

Summary

Cross-domain entity resolution in social media

A reverse approach to named entity extraction and linking in microposts

April 11, 2016

Conference Paper

Author:

Kara B. Greenfield

…

Published in:

Proc. of the 6th Workshop on "Making Sense of Microposts" (part of: 25th Int. World Wide Web Conf., 11 April 2016), #Microposts2016, pp. 67-69.

Topic:

topic recognition

R&D area:

Cyber Security and Information Sciences

R&D group:

Summary

In this paper, we present a pipeline for named entity extraction and linking that is designed specifically for noisy, grammatically inconsistent domains where traditional named entity techniques perform poorly. Our approach leverages a large knowledge base to improve entity recognition, while maintaining the use of traditional NER to identify mentions that are not co-referent with any entities in the knowledge base.

READ LESS

Summary

A reverse approach to named entity extraction and linking in microposts

Named entity recognition in 140 characters or less

April 11, 2016

Conference Paper

Author:

Kelly L. Geyer

…

Published in:

Proc. of the 6th Workshop on "Making Sense of Microposts" (part of: 25th Int. World Wide Web Conf., 11 April 2016), #Microposts2016, pp. 78-79.

Topic:

topic recognition

R&D area:

Cyber Security and Information Sciences

R&D group:

Summary

In this paper, we explore the problem of recognizing named entities in microposts, a genre with notoriously little context surrounding each named entity and inconsistent use of grammar, punctuation, capitalization, and spelling conventions by authors. In spite of the challenges associated with information extraction from microposts, it remains an increasingly important genre. This paper presents the MIT Information Extraction Toolkit (MITIE) and explores its adaptability to the micropost genre.

READ LESS

Summary

Named entity recognition in 140 characters or less

Joint audio-visual mining of uncooperatively collected video: FY14 Line-Supported Information, Computation, and Exploitation Program

February 5, 2015

Project Report

Author:

Cagri K. Dagli

…

Published in:

MIT Lincoln Laboratory Report LSP-116

Topic:

biometrics

R&D area:

Cyber Security and Information Sciences

R&D group:

Summary

The rate at which video is being created and gathered is rapidly accelerating as access to means of production and distribution expand. This rate of increase, however, is greatly outpacing the development of content-based tools to help users sift through this unstructured, multimedia data. The need for such technologies becomes more acute when considering their potential value in critical, media-rich government applications such as Seized Media Analysis, Social Media Forensics, and Foreign Media Monitoring. A fundamental challenge in developing technologies in these application areas is that they are typically in low-resource data domains. Low-resource domains are ones where the lack of ground-truth labels and statistical support prevent the direct application of traditional machine learning approaches. To help bridge this capability gap, the Joint Audio and Visual Mining of Uncooperatively Collected Video ICE Line Program (2236-1301) is developing new technologies for better content-based search, summarization, and browsing of large collections of unstructured, uncooperatively collected multimedia. In particular, this effort seeks to improve capabilities in video understanding by jointly exploiting time aligned audio, visual, and text information, an approach which has been underutilized in both the academic and commercial communities. Exploiting subtle connections between and across multiple modalities in low-resource multimedia data helps enable deeper video understanding, and in some cases provides new capability where it previously didn't exist. This report outlines work done in Fiscal Year 2014 (FY14) by the cross-divisional, interdisciplinary team tasked to meet these objectives. In the following sections, we highlight technologies developed in FY14 to support efficient Query-by-Example, Attribute, Keyword Search and Cross-Media Exploration and Summarization. Additionally, we preview work proposed for Fiscal Year 2015 as well as summarize our external sponsor interactions and publications/presentations.

READ LESS

Summary

Joint audio-visual mining of uncooperatively collected video: FY14 Line-Supported Information, Computation, and Exploitation Program

Bayesian discovery of threat networks

October 15, 2014

Journal Article

Author:

Steven T. Smith

…

Published in:

IEEE Trans. Signal Process., Vol. 62, No. 20, 15 October 2014, pp. 5324-38.

Topic:

signal processing

R&D area:

Homeland Protection

R&D group:

Homeland Decision Support Systems

Summary

A novel unified Bayesian framework for network detection is developed, under which a detection algorithm is derived based on random walks on graphs. The algorithm detects threat networks using partial observations of their activity, and is proved to be optimum in the Neyman-Pearson sense. The algorithm is defined by a graph, at least one observation, and a diffusion model for threat. A link to well-known spectral detection methods is provided, and the equivalence of the random walk and harmonic solutions to the Bayesian formulation is proven. A general diffusion model is introduced that utilizes spatio-temporal relationships between vertices, and is used for a specific space-time formulation that leads to significant performance improvements on coordinated covert networks. This performance is demonstrated using a new hybrid mixed-membership blockmodel introduced to simulate random covert networks with realistic properties.

READ LESS

Summary

Bayesian discovery of threat networks

D4M 2.0 Schema: a general purpose high performance schema for the Accumulo database

September 10, 2013

Conference Paper

Author:

Jeremy Kepner

…

Published in:

HPEC 2013: IEEE Conf. on High Performance Extreme Computing, 10-12 September 2013.

Topic:

big data

R&D area:

Cyber Security and Information Sciences

R&D group:

Summary

Non-traditional, relaxed consistency, triple store databases are the backbone of many web companies (e.g., Google Big Table, Amazon Dynamo, and Facebook Cassandra). The Apache Accumulo database is a high performance open source relaxed consistency database that is widely used for government applications. Obtaining the full benefits of Accumulo requires using novel schemas. The Dynamic Distributed Dimensional Data Model (D4M) [http://www.mit.edu/~kepner/D4M] provides a uniform mathematical framework based on associative arrays that encompasses both traditional (i.e., SQL) and non-traditional databases. For non-traditional databases D4M naturally leads to a general purpose schema that can be used to fully index and rapidly query every unique string in a dataset. The D4M 2.0 Schema has been applied with little or no customization to cyber, bioinformatics, scientific citation, free text, and social media data. The D4M 2.0 Schema is simple, requires minimal parsing, and achieves the highest published Accumulo ingest rates. The benefits of the D4M 2.0 Schema are independent of the D4M interface. Any interface to Accumulo can achieve these benefits by using the D4M 2.0 Schema.

READ LESS

Summary

D4M 2.0 Schema: a general purpose high performance schema for the Accumulo database

Publications

Refine Results

Summary

Summary

Summary

Summary

Summary

Summary

Summary

Summary

Summary

Summary

Summary

Summary

Summary

Summary

Joint audio-visual mining of uncooperatively collected video: FY14 Line-Supported Information, Computation, and Exploitation Program

Summary

Summary

Summary

Summary

Summary

Summary

Showing Results