Publications

Refine Results

(Filters Applied) Clear All

Analytical models and methods for anomaly detection in dynamic, attributed graphs

Published in:
Chapter 2, Computational Network Analysis with R: Applications in Biology, Medicine, and Chemistry, 2017, pp. 35-61.

Summary

This chapter is devoted to anomaly detection in dynamic, attributed graphs. There has been a great deal of research on anomaly detection in graphs over the last decade, with a variety of methods proposed. This chapter discusses recent methods for anomaly detection in graphs,with a specific focus on detection within backgrounds based on random graph models. This sort of analysis can be applied for a variety of background models, which can incorporate topological dynamics and attributes of vertices and edges. The authors have developed a framework for anomalous subgraph detection in random background models, based on linear algebraic features of a graph. This includes an implementation in R that exploits structure in the random graph model for computationally tractable analysis of residuals. This chapter outlines this framework within the context of analyzing dynamic, attributed graphs. The remainder of this chapter is organized as follows. Section 2.2 defines the notation used within the chapter. Section 2.3 briefly describes a variety of perspectives and techniques for anomaly detection in graph-based data. Section 2.4 provides an overview of models for graph behavior that can be used as backgrounds for anomaly detection. Section 2.5 describes our framework for anomalous subgraph detection via spectral analysis of residuals, after the data are integrated over time. Section 2.6 discusses how the method described in Section 2.5 can be efficiently implemented in R using open source packages. Section 2.7 demonstrates the power of this technique in controlled simulation, considering the effects of both dynamics and attributes on detection performance. Section 2.8 gives a data analysis example within this context, using an evolving citation graph based on a commercially available document database of public scientific literature. Section 2.9 summarizes the chapter and discusses ongoing research in this area.
READ LESS

Summary

This chapter is devoted to anomaly detection in dynamic, attributed graphs. There has been a great deal of research on anomaly detection in graphs over the last decade, with a variety of methods proposed. This chapter discusses recent methods for anomaly detection in graphs,with a specific focus on detection within...

READ MORE

A spectral framework for anomalous subgraph detection

Published in:
IEEE Trans. Signal Process., Vol. 63, No. 16, 15 August 2015, 4191-4206.

Summary

A wide variety of application domains is concerned with data consisting of entities and their relationships or connections, formally represented as graphs. Within these diverse application areas, a common problem of interest is the detection of a subset of entities whose connectivity is anomalous with respect to the rest of the data. While the detection of such anomalous subgraphs has received a substantial amount of attention, no application-agnostic framework exists for analysis of signal detectability in graph-based data. In this paper, we describe a framework that enables such analysis using the principal eigenspace of a graph's residuals matrix, commonly called the modularity matrix in community detection. Leveraging this analytical tool, we show that the framework has a natural power metric in the spectral norm of the anomalous subgraph's adjacency matrix (signal power) and of the background graph's residuals matrix (noise power). We propose several algorithms based on spectral properties of the residuals matrix, with more computationally expensive techniques providing greater detection power. Detection and identification performance are presented for a number of signal and noise models, including clusters and bipartite foregrounds embedded into simple random backgrounds, as well as graphs with community structure and realistic degree distributions. The trends observed verify intuition gleaned from other signal processing areas, such as greater detection power when the signal is embedded within a less active portion of the background. We demonstrate the utility of the proposed techniques in detecting small, highly anomalous subgraphs in real graphs derived from Internet traffic and product co-purchases.
READ LESS

Summary

A wide variety of application domains is concerned with data consisting of entities and their relationships or connections, formally represented as graphs. Within these diverse application areas, a common problem of interest is the detection of a subset of entities whose connectivity is anomalous with respect to the rest of...

READ MORE

Temporal and multi-source fusion for detection of innovation in collaboration networks

Published in:
Proc. of the 18th Int. Conf. On Information Fusion, 6-9 July 2015.

Summary

A common problem in network analysis is detecting small subgraphs of interest within a large background graph. This includes multi-source fusion scenarios where data from several modalities must be integrated to form the network. This paper presents an application of novel techniques leveraging the signal processing for graphs algorithmic framework, to well-studied collaboration networks in the field of evolutionary biology. Our multi-disciplinary approach allows us to leverage case studies of transformative periods in this scientific field as truth. We build on previous work by optimizing the temporal integration filters with respect to truth data using a tensor decomposition method that maximizes the spectral norm of the integrated subgraph's adjacency matrix. We also demonstrate that we can mitigate data corruption via fusion of different data sources, demonstrating the power of this analysis framework for incomplete and corrupted data.
READ LESS

Summary

A common problem in network analysis is detecting small subgraphs of interest within a large background graph. This includes multi-source fusion scenarios where data from several modalities must be integrated to form the network. This paper presents an application of novel techniques leveraging the signal processing for graphs algorithmic framework...

READ MORE

Very Large Graphs for Information Extraction (VLG) - summary of first-year proof-of-concept study

Summary

In numerous application domains relevant to the Department of Defense and the Intelligence Community, data of interest take the form of entities and the relationships between them, and these data are commonly represented as graphs. Under the Very Large Graphs for Information Extraction effort--a one-year proof-of-concept study--MIT LL developed novel techniques for anomalous subgraph detection, building on tools in the signal processing research literature. This report documents the technical results of this effort. Two datasets--a snapshot of Thompson Reuters? Web of Science database and a stream of web proxy logs--were parsed, and graphs were constructed from the raw data. From the phenomena in these datasets, several algorithms were developed to model the dynamic graph behavior, including a preferential attachment mechanism with memory, a streaming filter to model a graph as a weighted average of its past connections, and a generalized linear model for graphs where connection probabilities are determined by additional side information or metadata. A set of metrics was also constructed to facilitate comparison of techniques. The study culminated in a demonstration of the algorithms on the datasets of interest, in addition to simulated data. Performance in terms of detection, estimation, and computational burden was measured according to the metrics. Among the highlights of this demonstration were the detection of emerging coauthor clusters in the Web of Science data, detection of botnet activity in the web proxy data after 15 minutes (which took 10 days to detect using state-of-the-practice techniques), and demonstration of the core algorithm on a simulated 1-billion-vertex graph using a commodity computing cluster.
READ LESS

Summary

In numerous application domains relevant to the Department of Defense and the Intelligence Community, data of interest take the form of entities and the relationships between them, and these data are commonly represented as graphs. Under the Very Large Graphs for Information Extraction effort--a one-year proof-of-concept study--MIT LL developed novel...

READ MORE

Efficient anomaly detection in dynamic, attributed graphs: emerging phenomena and big data

Published in:
ISI 2013: IEEE Int. Conf. on Intelligence and Security Informatics, 4-7 June 2013.

Summary

When working with large-scale network data, the interconnected entities often have additional descriptive information. This additional metadata may provide insight that can be exploited for detection of anomalous events. In this paper, we use a generalized linear model for random attributed graphs to model connection probabilities using vertex metadata. For a class of such models, we show that an approximation to the exact model yields an exploitable structure in the edge probabilities, allowing for efficient scaling of a spectral framework for anomaly detection through analysis of graph residuals, and a fast and simple procedure for estimating the model parameters. In simulation, we demonstrate that taking into account both attributes and dynamics in this analysis has a much more significant impact on the detection of an emerging anomaly than accounting for either dynamics or attributes alone. We also present an analysis of a large, dynamic citation graph, demonstrating that taking additional document metadata into account emphasizes parts of the graph that would not be considered significant otherwise.
READ LESS

Summary

When working with large-scale network data, the interconnected entities often have additional descriptive information. This additional metadata may provide insight that can be exploited for detection of anomalous events. In this paper, we use a generalized linear model for random attributed graphs to model connection probabilities using vertex metadata. For...

READ MORE

P-sync: a photonically enabled architecture for efficient non-local data access

Summary

Communication in multi- and many-core processors has long been a bottleneck to performance due to the high cost of long-distance electrical transmission. This difficulty has been partially remedied by architectural constructs such as caches and novel interconnect topologies, albeit at a steep cost in terms of complexity. Unfortunately, even these measures are rendered ineffective by certain kinds of communication, most notably scatter and gather operations that exhibit highly non-local data access patterns. Much work has gone into examining how the increased bandwidth density afforded by chip-scale silicon photonic interconnect technologies affects computing, but photonics have additional properties that can be leveraged to greatly accelerate performance and energy efficiency under such difficult loads. This paper describes a novel synchronized global photonic bus and system architecture called P-sync that uses photonics' distance independence to greatly improve performance on many important applications previously limited by electronic interconnect. The architecture is evaluated in the context of a non-local yet common application: the distributed Fast Fourier Transform. We show that it is possible to achieve high efficiency by tightly balancing computation and communication latency in P-sync and achieve upwards of a 6x performance increase on gather patterns, even when bandwidth is equalized.
READ LESS

Summary

Communication in multi- and many-core processors has long been a bottleneck to performance due to the high cost of long-distance electrical transmission. This difficulty has been partially remedied by architectural constructs such as caches and novel interconnect topologies, albeit at a steep cost in terms of complexity. Unfortunately, even these...

READ MORE

LLGrid: supercomputer for sensor processing

Summary

MIT Lincoln Laboratory is a federally funded research and development center that applies advanced technology to problems of national interest. Research and development activities focus on long-term technology development as well as rapid system prototyping and demonstration. A key part of this mission is to develop and deploy advanced sensor systems. Developing the algorithms for these systems requires interactive access to large scale computing and data storage. Deploying these systems requires that the computing and storage capabilities are transportable and energy efficient. The LLGrid system of supercomputers allows hundreds of researchers simultaneous interactive access to large amounts of processing and storage for development and testing of their sensor processing algorithms. The requirements of the LLGrid user base are as diverse as the sensors they are developing: sonar, radar, infrared, optical, hyperspectral, video, bio and cyber. However, there are two common elements: delivering large amounts of data interactively to many processors and high level user interfaces that require minimal user training. The LLGrid software stack provides these capabilities on dozens of LLGrid computing clusters across Lincoln Laboratory. LLGrid systems range from very small (a few nodes) to very large (40+ racks).
READ LESS

Summary

MIT Lincoln Laboratory is a federally funded research and development center that applies advanced technology to problems of national interest. Research and development activities focus on long-term technology development as well as rapid system prototyping and demonstration. A key part of this mission is to develop and deploy advanced sensor...

READ MORE

Detection theory for graphs

Summary

Graphs are fast emerging as a common data structure used in many scientific and engineering fields. While a wide variety of techniques exist to analyze graph datasets, practitioners currently lack a signal processing theory akin to that of detection and estimation in the classical setting of vector spaces with Gaussian noise. Using practical detection examples involving large, random "background" graphs and noisy real-world datasets, the authors present a novel graph analytics framework that allows for uncued analysis of very large datasets. This framework combines traditional computer science techniques with signal processing in the context of graph data, creating a new research area at the intersection of the two fields.
READ LESS

Summary

Graphs are fast emerging as a common data structure used in many scientific and engineering fields. While a wide variety of techniques exist to analyze graph datasets, practitioners currently lack a signal processing theory akin to that of detection and estimation in the classical setting of vector spaces with Gaussian...

READ MORE

Characterization of traffic and structure in the U.S. airport network

Summary

In this paper we seek to characterize traffic in the U.S. air transportation system, and to subsequently develop improved models of traffic demand. We model the air traffic within the U.S. national airspace system as dynamic weighted network. We employ techniques advanced by work in complex networks over the past several years in characterizing the structure and dynamics of the U.S. airport network. We show that the airport network is more dynamic over successive days than has been previously reported. The network has some properties that appear stationary over time, while others exhibit a high degree of variation. We characterize the network and its dynamics using structural measures such as degree distributions and clustering coefficients. We employ spectral analysis to show that dominant eigenvectors of the network are nearly stationary with time. We use this observation to suggest how low dimensional models of traffic demand in the airport network can be fashioned.
READ LESS

Summary

In this paper we seek to characterize traffic in the U.S. air transportation system, and to subsequently develop improved models of traffic demand. We model the air traffic within the U.S. national airspace system as dynamic weighted network. We employ techniques advanced by work in complex networks over the past...

READ MORE

Cluster-based 3D reconstruction of aerial video

Author:
Published in:
HPEC 2012: IEEE Conf. on High Performance Extreme Computing, 10-12 September 2012.

Summary

Large-scale 3D scene reconstruction using Structure from Motion (SfM) continues to be very computationally challenging despite much active research in the area. We propose an efficient, scalable processing chain designed for cluster computing and suitable for use on aerial video. The sparse bundle adjustment step, which is iterative and difficult to parallelize, is accomplished by partitioning the input image set, generating independent point clouds in parallel, and then fusing the clouds and combining duplicate points. We compare this processing chain to a leading parallel SfM implementation, which exploits fine-grained parallelism in various matrix operations and is not designed to scale beyond a multi-core workstation with GPU. We show our cluster-based approach offers significant improvement in scalability and runtime while producing comparable point cloud density and more accurate point location estimates.
READ LESS

Summary

Large-scale 3D scene reconstruction using Structure from Motion (SfM) continues to be very computationally challenging despite much active research in the area. We propose an efficient, scalable processing chain designed for cluster computing and suitable for use on aerial video. The sparse bundle adjustment step, which is iterative and difficult...

READ MORE