Publications

Refine Results

(Filters Applied) Clear All

Global pattern search at scale

Summary

In recent years, data collection has far outpaced the tools for data analysis in the area of non-traditional GEOINT analysis. Traditional tools are designed to analyze small-scale numerical data, but there are few good interactive tools for processing large amounts of unstructured data such as raw text. In addition to the complexities of data processing, presenting the data in a way that is meaningful to the end user poses another challenge. In our work, we focused on analyzing a corpus of 35,000 news articles and creating an interactive geovisualization tool to reveal patterns to human analysts. Our comprehensive tool, Global Pattern Search at Scale (GPSS), addresses three major problems in data analysis: free text analysis, high volumes of data, and interactive visualization. GPSS uses an Accumulo database for high-volume data storage, and a matrix of word counts and event detection algorithms to process the free text. For visualization, the tool displays an interactive web application to the user, featuring a map overlaid with document clusters and events, search and filtering options, a timeline, and a word cloud. In addition, the GPSS tool can be easily adapted to process and understand other large free-text datasets.
READ LESS

Summary

In recent years, data collection has far outpaced the tools for data analysis in the area of non-traditional GEOINT analysis. Traditional tools are designed to analyze small-scale numerical data, but there are few good interactive tools for processing large amounts of unstructured data such as raw text. In addition to...

READ MORE

Very large graphs for information extraction (VLG) - summary of first-year proof-of-concept study

Summary

In numerous application domains relevant to the Department of Defense and the Intelligence Community, data of interest take the form of entities and the relationships between them, and these data are commonly represented as graphs. Under the Very Large Graphs for Information Extraction effort--a one-year proof-of-concept study--MIT LL developed novel techniques for anomalous subgraph detection, building on tools in the signal processing research literature. This report documents the technical results of this effort. Two datasets--a snapshot of Thompson Reuters? Web of Science database and a stream of web proxy logs--were parsed, and graphs were constructed from the raw data. From the phenomena in these datasets, several algorithms were developed to model the dynamic graph behavior, including a preferential attachment mechanism with memory, a streaming filter to model a graph as a weighted average of its past connections, and a generalized linear model for graphs where connection probabilities are determined by additional side information or metadata. A set of metrics was also constructed to facilitate comparison of techniques. The study culminated in a demonstration of the algorithms on the datasets of interest, in addition to simulated data. Performance in terms of detection, estimation, and computational burden was measured according to the metrics. Among the highlights of this demonstration were the detection of emerging coauthor clusters in the Web of Science data, detection of botnet activity in the web proxy data after 15 minutes (which took 10 days to detect using state-of-the-practice techniques), and demonstration of the core algorithm on a simulated 1-billion-vertex graph using a commodity computing cluster.
READ LESS

Summary

In numerous application domains relevant to the Department of Defense and the Intelligence Community, data of interest take the form of entities and the relationships between them, and these data are commonly represented as graphs. Under the Very Large Graphs for Information Extraction effort--a one-year proof-of-concept study--MIT LL developed novel...

READ MORE

Characterization of traffic and structure in the U.S. airport network

Summary

In this paper we seek to characterize traffic in the U.S. air transportation system, and to subsequently develop improved models of traffic demand. We model the air traffic within the U.S. national airspace system as dynamic weighted network. We employ techniques advanced by work in complex networks over the past several years in characterizing the structure and dynamics of the U.S. airport network. We show that the airport network is more dynamic over successive days than has been previously reported. The network has some properties that appear stationary over time, while others exhibit a high degree of variation. We characterize the network and its dynamics using structural measures such as degree distributions and clustering coefficients. We employ spectral analysis to show that dominant eigenvectors of the network are nearly stationary with time. We use this observation to suggest how low dimensional models of traffic demand in the airport network can be fashioned.
READ LESS

Summary

In this paper we seek to characterize traffic in the U.S. air transportation system, and to subsequently develop improved models of traffic demand. We model the air traffic within the U.S. national airspace system as dynamic weighted network. We employ techniques advanced by work in complex networks over the past...

READ MORE

A scalable signal processing architecture for massive graph analysis

Published in:
ICASSP 2012, Proc. IEEE Int. Conf. on Acoustics, Speech and Signal Processing, 25-30 March 2012, pp. 5329-32.

Summary

In many applications, it is convenient to represent data as a graph, and often these datasets will be quite large. This paper presents an architecture for analyzing massive graphs, with a focus on signal processing applications such as modeling, filtering, and signal detection. We describe the architecture, which covers the entire processing chain, from data storage to graph construction to graph analysis and subgraph detection. The data are stored in a new format that allows easy extraction of graphs representing any relationship existing in the data. The principal analysis algorithm is the partial eigendecomposition of the modularity matrix, whose running time is discussed. A large document dataset is analyzed, and we present subgraphs that stand out in the principal eigenspace of the time varying graphs, including behavior we regard as clutter as well as small, tightly-connected clusters that emerge over time.
READ LESS

Summary

In many applications, it is convenient to represent data as a graph, and often these datasets will be quite large. This paper presents an architecture for analyzing massive graphs, with a focus on signal processing applications such as modeling, filtering, and signal detection. We describe the architecture, which covers the...

READ MORE

Showing Results

1-4 of 4