Publications

Refine Results

(Filters Applied) Clear All

GraphChallenge.org triangle counting performance [e-print]

Summary

The rise of graph analytic systems has created a need for new ways to measure and compare the capabilities of graph processing systems. The MIT/Amazon/IEEE Graph Challenge has been developed to provide a well-defined community venue for stimulating research and highlighting innovations in graph analysis software, hardware, algorithms, and systems. GraphChallenge.org provides a wide range of preparsed graph data sets, graph generators, mathematically defined graph algorithms, example serial implementations in a variety of languages, and specific metrics for measuring performance. The triangle counting component of GraphChallenge.org tests the performance of graph processing systems to count all the triangles in a graph and exercises key graph operations found in many graph algorithms. In 2017, 2018, and 2019 many triangle counting submissions were received from a wide range of authors and organizations. This paper presents a performance analysis of the best performers of these submissions. These submissions show that their state-of-the-art triangle counting execution time, Ttri, is a strong function of the number of edges in the graph, Ne, which improved significantly from 2017 (Ttri \approx (Ne/10^8)^4=3) to 2018 (Ttri \approx Ne/10^9) and remained comparable from 2018 to 2019. Graph Challenge provides a clear picture of current graph analysis systems and underscores the need for new innovations to achieve high performance on very large graphs
READ LESS

Summary

The rise of graph analytic systems has created a need for new ways to measure and compare the capabilities of graph processing systems. The MIT/Amazon/IEEE Graph Challenge has been developed to provide a well-defined community venue for stimulating research and highlighting innovations in graph analysis software, hardware, algorithms, and systems...

READ MORE

Attacking Embeddings to Counter Community Detection

Published in:
Network Science Society Conference 2020 [submitted]

Summary

Community detection can be an extremely useful data triage tool, enabling a data analyst to split a largenetwork into smaller portions for a deeper analysis. If, however, a particular node wanted to avoid scrutiny, it could strategically create new connections that make it seem uninteresting. In this work, we investigate theuse of a state-of-the-art attack against node embedding as a means of countering community detection whilebeing blind to the attributes of others. The attack proposed in [1] attempts to maximize the loss function beingminimized by a random-walk-based embedding method (where two nodes are made closer together the more often a random walk starting at one node ends at the other). We propose using this method to attack thecommunity structure of the graph, specifically attacking the community assignment of an adversarial vertex. Since nodes in the same community tend to appear near each other in a random walk, their continuous-space embedding also tend to be close. Thus, we aim to use the general embedding attack in an attempt to shift the community membership of the adversarial vertex. To test this strategy, we adopt an experimental framework as in [2], where each node is given a “temperature” indicating how interesting it is. A node’s temperature can be “hot,” “cold,” or “unknown.” A node can perturbitself by adding new edges to any other node in the graph. The node’s goal is to be placed in a community thatis cold, i.e., where the average node temperature is less than 0. Of the 5 attacks proposed in [2], we use 2 in our experiments. The simpler attack is Cold and Lonely, which first connects to cold nodes, then unknown, then hot, and connects within each temperature in order of increasing degree. The more sophisticated attack is StableStructure. The procedure for this attack is to (1) identify stable structures (containing nodes assigned to the same community each time for several trials), (2) connect to nodes in order of increasing average temperature of their stable structures (randomly within a structure), and (3) connect to nodes with no stable structure in order of increasing temperature. As in [2], we use the Louvain modularity maximization technique for community detection. We slightly modify the embedding attack of [1] by only allowing addition of new edges and requiring that they include the adversary vertex. Since the embedding attack is blind to the temperatures of the nodes, experimenting with these attacks gives insight into how much this attribute information helps the adversary. Experimental results are shown in Figure 1. Graphs considered in these experiments are (1) an 500-node Erdos-Renyi graph with edge probabilityp= 0.02, (2) a stochastic block model with 5 communities of 100nodes each and edge probabilities ofpin= 0.06 andpout= 0.01, (3) the network of Abu Sayyaf Group (ASG)—aviolent non-state Islamist group operating in the Philippines—where two nodes are linked if they both participatein at least one kidnapping event, with labels derived from stable structures (nodes together in at least 95% of 1000 Louvain trials), and (4) the Cora machine learning citation graph, with 7 classes based on subjectarea. Temperature is assigned to the Erdos-Renyi nodes randomly with probability 0.25, 0.5, and 0.25 for hot,unknown, and cold, respectively. For the other graphs, nodes with the same label as the target are hot, unknown,and cold with probability 0.35, 0.55, and 0.1, respectively, and the hot and cold probabilities are swapped forother labels. The results demonstrate that, even without the temperature information, the embedding methodis about as effective as the Cold and Lonely when there is community structure to exploit, though it is not aseffective as Stable Structure, which leverages both community structure and temperature information.
READ LESS

Summary

Community detection can be an extremely useful data triage tool, enabling a data analyst to split a largenetwork into smaller portions for a deeper analysis. If, however, a particular node wanted to avoid scrutiny, it could strategically create new connections that make it seem uninteresting. In this work, we investigate...

READ MORE

Complex Network Effects on the Robustness of Graph Convolutional Networks

Summary

Vertex classification—the problem of identifying the class labels of nodes in a graph—has applicability in a wide variety of domains. Examples include classifying subject areas of papers in citation net-works or roles of machines in a computer network. Recent work has demonstrated that vertex classification using graph convolutional networks is susceptible to targeted poisoning attacks, in which both graph structure and node attributes can be changed in anattempt to misclassify a target node. This vulnerability decreases users’ confidence in the learning method and can prevent adoption in high-stakes contexts. This paper presents the first work aimed at leveraging network characteristics to improve robustness of these methods. Our focus is on using network features to choose the training set, rather than selecting the training set at random. Our alternative methods of selecting training data are (1) to select the highest-degree nodes in each class and (2) to iteratively select the node with the most neighbors minimally connected to the training set. In the datasets on which the original attack was demonstrated, we show that changing the training set can make the network much harder to attack. To maintain a given probability of attack success, the adversary must use far more perturbations; often a factor of 2–4 over the random training baseline. This increase in robustness is often as substantial as tripling the amount of randomly selected training data. Even in cases where success is relatively easy for the attacker, we show that classification performance degrades much more gradually using the proposed methods, with weaker incorrect predictions for the attacked nodes. Finally, we investigate the potential tradeoff between robustness and performance in various datasets.
READ LESS

Summary

Vertex classification—the problem of identifying the class labels of nodes in a graph—has applicability in a wide variety of domains. Examples include classifying subject areas of papers in citation net-works or roles of machines in a computer network. Recent work has demonstrated that vertex classification using graph convolutional networks is...

READ MORE

Multi-Objective Graph Matching via Signal Filtering

Author:
Published in:
IEEE Signal Processing Magazine Special Issue on GSP [submitted]

Summary

In this white paper we propose a new method which exploits tools from graph signal processing to solve the graph matching problem, the problem of estimating the correspondence between the vertex sets of two graphs. We recast the graph matching problem as matching multiple similarity matrices where the similarities are computed between filtered signals unique to eachnode. Using appropriate graph filters, these similarity matrices can emphasize long or short range behavior and the method will implicitly search for similarities between the graphs and at multiple scales. Our method shows substantial improvementsover standard methods which use the raw adjacency matrices, especially in low-information environments.
READ LESS

Summary

In this white paper we propose a new method which exploits tools from graph signal processing to solve the graph matching problem, the problem of estimating the correspondence between the vertex sets of two graphs. We recast the graph matching problem as matching multiple similarity matrices where the similarities are...

READ MORE

Sparse Deep Neural Network graph challenge

Published in:
IEEE High Performance Extreme Computing Conf., HPEC, 24-26 September 2019.

Summary

The MIT/IEEE/Amazon GraphChallenge.org encourages community approaches to developing new solutions for analyzing graphs and sparse data. Sparse AI analytics present unique scalability difficulties. The proposed Sparse Deep Neural Network (DNN) Challenge draws upon prior challenges from machine learning, high performance computing, and visual analytics to create a challenge that is reflective of emerging sparse AI systems. The Sparse DNN Challenge is based on a mathematically well-defined DNN inference computation and can be implemented in any programming environment. Sparse DNN inference is amenable to both vertex-centric implementations and array-based implementations (e.g., using the GraphBLAS.org standard). The computations are simple enough that performance predictions can be made based on simple computing hardware models. The input data sets are derived from the MNIST handwritten letters. The surrounding I/O and verification provide the context for each sparse DNN inference that allows rigorous definition of both the input and the output. Furthermore, since the proposed sparse DNN challenge is scalable in both problem size and hardware, it can be used to measure and quantitatively compare a wide range of present day and future systems. Reference implementations have been implemented and their serial and parallel performance have been measured. Specifications, data, and software are publicly available at GraphChallenge.org.
READ LESS

Summary

The MIT/IEEE/Amazon GraphChallenge.org encourages community approaches to developing new solutions for analyzing graphs and sparse data. Sparse AI analytics present unique scalability difficulties. The proposed Sparse Deep Neural Network (DNN) Challenge draws upon prior challenges from machine learning, high performance computing, and visual analytics to create a challenge that is...

READ MORE

Fundamental Questions in the Analysis of Large Graphs

Summary

Graphs are a general approach for representing information that spans the widest possible range of computing applications. They are particularly important to computational biology, web search, and knowledge discovery. As the sizes of graphs increase, the need to apply advanced mathematical and computational techniques to solve these problems is growing dramatically. Examining the mathematical and computational foundations of the analysis of large graphs generally leads to more questions than answers. This book concludes with a discussion of some of these questions.
READ LESS

Summary

Graphs are a general approach for representing information that spans the widest possible range of computing applications. They are particularly important to computational biology, web search, and knowledge discovery. As the sizes of graphs increase, the need to apply advanced mathematical and computational techniques to solve these problems is growing...

READ MORE

Visualizing Large Kronecker Graphs

Published in:
Graph Algorithms in the Language of Linear Algebra, pp. 241-250.

Summary

Kronecker graphs have been shown to be one of the most promising models for real-world networks. Visualization of Kronecker graphs is an important challenge. This chapter describes an interactive framework to assist scientists and engineers in generating, analyzing, and visualizing Kronecker graphs with as little effort as possible.
READ LESS

Summary

Kronecker graphs have been shown to be one of the most promising models for real-world networks. Visualization of Kronecker graphs is an important challenge. This chapter describes an interactive framework to assist scientists and engineers in generating, analyzing, and visualizing Kronecker graphs with as little effort as possible.

READ MORE

Subgraph Detection

Author:
Published in:
Graph Algorithms in the Language of Linear Algebra, pp. 115-133.

Summary

Detecting subgraphs of interest in larger graphs is the goal of many graph analysis techniques. The basis of detection theory is computing the probability of a “foreground” with respect to a model of the “background” data. Hidden Markov Models represent one possible foreground model for patterns of interaction in a graph. Likewise, Kronecker graphs are one possible model for power law background graphs. Combining these models allows estimates of the signal to noise ratio, probability of detection, and probability of false alarm for different classes of vertices in the foreground. These estimates can then be used to construct filters for computing the probability that a background graph contains a particular foreground graph. This approach is applied to the problem of detecting a partially labeled tree graph in a power law background graph. One feature of this method is the ability to a priori estimate the number of vertices that will be detected via the filter.
READ LESS

Summary

Detecting subgraphs of interest in larger graphs is the goal of many graph analysis techniques. The basis of detection theory is computing the probability of a “foreground” with respect to a model of the “background” data. Hidden Markov Models represent one possible foreground model for patterns of interaction in a...

READ MORE

The Kronecker theory of power law graphs

Author:
Published in:
Graph Algorithms in the Language of Linear Algebra, pp. 205-220.

Summary

An analytical theory of power law graphs is presented based on the Kronecker graph generation technique. Explicit, stochastic, and instance Kronecker graphs are used to highlight different properties. The analysis uses Kronecker exponentials of complete bipartite graphs to formulate the substructure of such graphs. The Kronecker theory allows various high-level quantities (e.g., degree distribution, betweenness centrality, diameter, eigenvalues, and iso-parametric ratio) to be computed directly from the model parameters.
READ LESS

Summary

An analytical theory of power law graphs is presented based on the Kronecker graph generation technique. Explicit, stochastic, and instance Kronecker graphs are used to highlight different properties. The analysis uses Kronecker exponentials of complete bipartite graphs to formulate the substructure of such graphs. The Kronecker theory allows various high-level...

READ MORE

3-d graph processor

Summary

Graph algorithms are used for numerous database applications such as analysis of financial transactions, social networking patterns, and internet data. While graph algorithms can work well with moderate size databases, processors often have difficulty providing sufficient throughput when the databases are large. This is because the processor architectures are poorly matched to the graph computational flow. For example, most modern processors utilize cache based memory in order to take advantage of highly localized memory access patterns. However, memory access patterns associated with graph processing are often random in nature and can result in high cache miss rates. In addition, graph algorithms require significant overhead computation for dealing with indices of vertices and edges.
READ LESS

Summary

Graph algorithms are used for numerous database applications such as analysis of financial transactions, social networking patterns, and internet data. While graph algorithms can work well with moderate size databases, processors often have difficulty providing sufficient throughput when the databases are large. This is because the processor architectures are poorly...

READ MORE

Showing Results

1-10 of 11