Publications

Refine Results

(Filters Applied) Clear All

Joint audio-visual mining of uncooperatively collected video: FY14 Line-Supported Information, Computation, and Exploitation Program

Summary

The rate at which video is being created and gathered is rapidly accelerating as access to means of production and distribution expand. This rate of increase, however, is greatly outpacing the development of content-based tools to help users sift through this unstructured, multimedia data. The need for such technologies becomes more acute when considering their potential value in critical, media-rich government applications such as Seized Media Analysis, Social Media Forensics, and Foreign Media Monitoring. A fundamental challenge in developing technologies in these application areas is that they are typically in low-resource data domains. Low-resource domains are ones where the lack of ground-truth labels and statistical support prevent the direct application of traditional machine learning approaches. To help bridge this capability gap, the Joint Audio and Visual Mining of Uncooperatively Collected Video ICE Line Program (2236-1301) is developing new technologies for better content-based search, summarization, and browsing of large collections of unstructured, uncooperatively collected multimedia. In particular, this effort seeks to improve capabilities in video understanding by jointly exploiting time aligned audio, visual, and text information, an approach which has been underutilized in both the academic and commercial communities. Exploiting subtle connections between and across multiple modalities in low-resource multimedia data helps enable deeper video understanding, and in some cases provides new capability where it previously didn't exist. This report outlines work done in Fiscal Year 2014 (FY14) by the cross-divisional, interdisciplinary team tasked to meet these objectives. In the following sections, we highlight technologies developed in FY14 to support efficient Query-by-Example, Attribute, Keyword Search and Cross-Media Exploration and Summarization. Additionally, we preview work proposed for Fiscal Year 2015 as well as summarize our external sponsor interactions and publications/presentations.
READ LESS

Summary

The rate at which video is being created and gathered is rapidly accelerating as access to means of production and distribution expand. This rate of increase, however, is greatly outpacing the development of content-based tools to help users sift through this unstructured, multimedia data. The need for such technologies becomes...

READ MORE

Bayesian discovery of threat networks

Published in:
IEEE Trans. Signal Process., Vol. 62, No. 20, 15 October 2014, pp. 5324-38.

Summary

A novel unified Bayesian framework for network detection is developed, under which a detection algorithm is derived based on random walks on graphs. The algorithm detects threat networks using partial observations of their activity, and is proved to be optimum in the Neyman-Pearson sense. The algorithm is defined by a graph, at least one observation, and a diffusion model for threat. A link to well-known spectral detection methods is provided, and the equivalence of the random walk and harmonic solutions to the Bayesian formulation is proven. A general diffusion model is introduced that utilizes spatio-temporal relationships between vertices, and is used for a specific space-time formulation that leads to significant performance improvements on coordinated covert networks. This performance is demonstrated using a new hybrid mixed-membership blockmodel introduced to simulate random covert networks with realistic properties.
READ LESS

Summary

A novel unified Bayesian framework for network detection is developed, under which a detection algorithm is derived based on random walks on graphs. The algorithm detects threat networks using partial observations of their activity, and is proved to be optimum in the Neyman-Pearson sense. The algorithm is defined by a...

READ MORE

D4M 2.0 Schema: a general purpose high performance schema for the Accumulo database

Summary

Non-traditional, relaxed consistency, triple store databases are the backbone of many web companies (e.g., Google Big Table, Amazon Dynamo, and Facebook Cassandra). The Apache Accumulo database is a high performance open source relaxed consistency database that is widely used for government applications. Obtaining the full benefits of Accumulo requires using novel schemas. The Dynamic Distributed Dimensional Data Model (D4M) [http://www.mit.edu/~kepner/D4M] provides a uniform mathematical framework based on associative arrays that encompasses both traditional (i.e., SQL) and non-traditional databases. For non-traditional databases D4M naturally leads to a general purpose schema that can be used to fully index and rapidly query every unique string in a dataset. The D4M 2.0 Schema has been applied with little or no customization to cyber, bioinformatics, scientific citation, free text, and social media data. The D4M 2.0 Schema is simple, requires minimal parsing, and achieves the highest published Accumulo ingest rates. The benefits of the D4M 2.0 Schema are independent of the D4M interface. Any interface to Accumulo can achieve these benefits by using the D4M 2.0 Schema.
READ LESS

Summary

Non-traditional, relaxed consistency, triple store databases are the backbone of many web companies (e.g., Google Big Table, Amazon Dynamo, and Facebook Cassandra). The Apache Accumulo database is a high performance open source relaxed consistency database that is widely used for government applications. Obtaining the full benefits of Accumulo requires using...

READ MORE

Estimation of Causal Peer Influence Effects

Author:
Published in:
International Conference on Machine Learning, 17-19 June 2013

Summary

The broad adoption of social media has generated interest in leveraging peer influence for inducing desired user behavior. Quantifying the causal effect of peer influence presents technical challenges, however, including how to deal with social interference, complex response functions and network uncertainty. In this paper, we extend potential outcomes to allow for interference, we introduce welldefined causal estimands of peer-influence, and we develop two estimation procedures: a frequentist procedure relying on a sequential randomization design that requires knowledge of the network but operates under complicated response functions, and a Bayesian procedure which accounts for network uncertainty but relies on a linear response assumption to increase estimation precision. Our results show the advantages and disadvantages of the proposed methods in a number of situations.
READ LESS

Summary

The broad adoption of social media has generated interest in leveraging peer influence for inducing desired user behavior. Quantifying the causal effect of peer influence presents technical challenges, however, including how to deal with social interference, complex response functions and network uncertainty. In this paper, we extend potential outcomes to...

READ MORE

Detection theory for graphs

Summary

Graphs are fast emerging as a common data structure used in many scientific and engineering fields. While a wide variety of techniques exist to analyze graph datasets, practitioners currently lack a signal processing theory akin to that of detection and estimation in the classical setting of vector spaces with Gaussian noise. Using practical detection examples involving large, random "background" graphs and noisy real-world datasets, the authors present a novel graph analytics framework that allows for uncued analysis of very large datasets. This framework combines traditional computer science techniques with signal processing in the context of graph data, creating a new research area at the intersection of the two fields.
READ LESS

Summary

Graphs are fast emerging as a common data structure used in many scientific and engineering fields. While a wide variety of techniques exist to analyze graph datasets, practitioners currently lack a signal processing theory akin to that of detection and estimation in the classical setting of vector spaces with Gaussian...

READ MORE

Dynamic Distributed Dimensional Data Model (D4M) database and computation system

Summary

A crucial element of large web companies is their ability to collect and analyze massive amounts of data. Tuple store databases are a key enabling technology employed by many of these companies (e.g., Google Big Table and Amazon Dynamo). Tuple stores are highly scalable and run on commodity clusters, but lack interfaces to support efficient development of mathematically based analytics. D4M (Dynamic Distributed Dimensional Data Model) has been developed to provide a mathematically rich interface to tuple stores (and structured query language "SQL" databases). D4M allows linear algebra to be readily applied to databases. Using D4M, it is possible to create composable analytics with significantly less effort than using traditional approaches. This work describes the D4M technology and its application and performance.
READ LESS

Summary

A crucial element of large web companies is their ability to collect and analyze massive amounts of data. Tuple store databases are a key enabling technology employed by many of these companies (e.g., Google Big Table and Amazon Dynamo). Tuple stores are highly scalable and run on commodity clusters, but...

READ MORE

Discrete optimization using decision-directed learning for distributed networked computing

Summary

Decision-directed learning (DDL) is an iterative discrete approach to finding a feasible solution for large-scale combinatorial optimization problems. DDL is capable of efficiently formulating a solution to network scheduling problems that involve load limiting device utilization, selecting parallel configurations for software applications and host hardware using a minimum set of resources, and meeting time-to-result performance requirements in a dynamic network environment. This paper quantifies the algorithms that constitute DDL and compares its performance to other popular combinatorial self-directed real-time networked resource configuration for dynamically building a mission specific signal-processor for real-time distributed and parallel applications.
READ LESS

Summary

Decision-directed learning (DDL) is an iterative discrete approach to finding a feasible solution for large-scale combinatorial optimization problems. DDL is capable of efficiently formulating a solution to network scheduling problems that involve load limiting device utilization, selecting parallel configurations for software applications and host hardware using a minimum set of...

READ MORE

ITWS microburst prediction algorithm performance, capabilities, and limitations

Summary

Lincoln Laboratory, under funding from the Federal Aviation Administration (FAA) Terminal Doppler Weather Radar program, has developed algorithms for automatically detecting microbursts. While microburst detection algorithms provide highly reliable warnings of microbursts. there still remains a period of time between microburst onset and pilot reaction during which aircraft are at risk. This latency is due to the time needed for the automated algorithms to operate on the radar data, for air traffic controllers to relay any warnings and for pilots to react to the warnings. Lincoln Laboratory research and development has yielded an algorithm for accurately predicting when microburst outflows will occur. The Microburst Prediction Algorithm is part of a suite of weather detection algorithms within the Integrated Terminal Weather System. This paper details the performance of the Microburst Prediction Algorithm over a wide range of geographical and climatological environments. The paper also discusses the full range of the Microburst Prediction Algorithm's capabilities and limitations in varied weather environments. This paper does not discuss the overall rationale for a prediction algorithm or the detailed methodology used to generate predictions.
READ LESS

Summary

Lincoln Laboratory, under funding from the Federal Aviation Administration (FAA) Terminal Doppler Weather Radar program, has developed algorithms for automatically detecting microbursts. While microburst detection algorithms provide highly reliable warnings of microbursts. there still remains a period of time between microburst onset and pilot reaction during which aircraft are at...

READ MORE