Lincoln Laboratory Journal - Volume 20, Number 1

Special Issue on Graphs and Networks

Confronting the Challenges of Graphs and Networks

Nadya T. Bliss and Matthew C. Schmidt

New application needs, combined with the rapidly increasing sizes of datasets, are driving the development of a new field at the intersection of computer science, mathematics, signal processing, and the social sciences. Lincoln Laboratory has been pioneering research to address the challenges of using graphs and networks to exploit these vast databases. This issue of the Lincoln Laboratory Journal focuses on some of this innovative work.

Detection Theory for Graphs

Benjamin A. Miller, Nadya T. Bliss, Patrick J. Wolfe, and Michelle S. Beard

Graphs are fast emerging as a common data structure used in many scientific and engineering fields. While a wide variety of techniques exist to analyze graph datasets, practitioners currently lack a signal processing theory akin to that of detection and estimation in the classical setting of vector spaces with Gaussian noise. Using practical detection examples involving large, random "background" graphs and noisy real-world datasets, the authors present a novel graph analytics framework that allows for uncued analysis of very large datasets. This framework combines traditional computer science techniques with signal processing in the context of graph data, creating a new research area at the intersection of the two fields.

Network Discovery with Multi-intelligence Sources

Michael J. Yee, Scott Philips, Gary R. Condon, Peter B. Jones, Edward K. Kao, Steven T. Smith, Christian C. Anderson, and Frederick R. Waugh

Analysts can glean much useful intelligence information from identifying relationships between individuals and groups, and tracking their activities. However, detecting networks of people and then investigating their activities are difficult tasks, especially in this era of information overload. Graph analysis has proven to be a useful tool for addressing these tasks, but it can be labor-intensive. To aid in this analysis, Lincoln Laboratory researchers developed a diffusion-based analytic that helps solve the problems of network discovery and prioritized exploration. This analytic, called threat propagation, has been demonstrated to effectively handle network detection and exploration tasks, and has been integrated into an interactive tool for generating networks from wide-area motion imagery.

Covert Network Detection

Steven T. Smith, Kenneth D. Senne, Scott Philips, Edward K. Kao, and Garrett Bernstein

Covert network detection is an important capability in areas of applied research in which the data of interest can be represented as a relatively small subgraph in an enormous, potentially uninteresting background. This aspect characterizes covert network detection as a "Big Data" problem. In this article, a new Bayesian network detection framework is introduced that partitions the graph on the basis of prior information and direct observations. We also explore a new generative stochastic model for covert networks and analyze the detection performance of both classes of optimal detection techniques.

Social Network Analysis with Content and Graphs

William M. Campbell, Charlie K. Dagli, and Clifford J. Weinstein

Social network analysis has undergone a renaissance with the ubiquity and quantity of content from social media, web pages, and sensors. This content is a rich data source for constructing and analyzing social networks, but its enormity and unstructured nature also present multiple challenges. Work at Lincoln Laboratory is addressing the problems in constructing networks from unstructured data, analyzing the community structure of a network, and inferring information from networks. Graph analytics have proven to be valuable tools in solving these challenges. Through the use of these tools, Laboratory researchers have achieved promising results on real-world data. A sampling of these results are presented in this article.

Taming Biological Big Data with D4M

Jeremy Kepner, Darrell O. Ricke, and Dylan Hutchison

The supercomputing community has taken up the challenge of "taming the beast" spawned by the massive amount of data available in the bioinformatics domain: How can these data be exploited faster and better? MIT Lincoln Laboratory computer scientists demonstrated how a new Laboratory-developed technology, the Dynamic Distributed Dimensional Data Model (D4M), can be used to accelerate DNA sequence comparison, a core operation in bioinformatics.

Novel Graph Processor Architecture

William S. Song, Jeremy Kepner, Vitaliy Gleyzer, Huy T. Nguyen, and Joshua I. Kramer

Graph algorithms are increasingly used in applications that exploit large databases. However, conventional processor architectures are hard-pressed to handle the throughput and memory requirements of graph computation. Lincoln Laboratory's graph-processor architecture represents a fundamental rethinking of architectures. It utilizes innovations that include high-bandwidth three-dimensional (3D) communication links, a sparse matrix-based graph instruction set, accelerator-based architecture, a systolic sorter, randomized communications, a cacheless memory system, and 3D packaging.

3D Exploitation of 2D Imagery

Peter Cho and Noah Snavely

Recent advances in computer vision have enabled the automatic recovery of camera and scene geometry from large collections of photographs and videos. Such three-dimensional imagery reconstructions may be georegistered with maps based upon ladar, geographic information system, and/or GPS data. Once 3D frameworks for analyzing two-dimensional digital pictures are established, high-level knowledge readily propagates among data products collected at different times, places, and perspectives. We demonstrate geometry-based exploitation for several imagery applications of importance to the defense and intelligence communities: perimeter surveillance via a single stationary camera, rural reconnaissance via a mobile aerial camera, urban mapping via several semicooperative ground cameras, and social media mining via many uncooperative cameras. Though a priori camera uncertainty grows in this series and requires progressively more computational power to resolve, a geometrical framework renders all these applications tractable.