Publications

Refine Results

(Filters Applied) Clear All

GraphChallenge.org triangle counting performance [e-print]

Summary

The rise of graph analytic systems has created a need for new ways to measure and compare the capabilities of graph processing systems. The MIT/Amazon/IEEE Graph Challenge has been developed to provide a well-defined community venue for stimulating research and highlighting innovations in graph analysis software, hardware, algorithms, and systems. GraphChallenge.org provides a wide range of preparsed graph data sets, graph generators, mathematically defined graph algorithms, example serial implementations in a variety of languages, and specific metrics for measuring performance. The triangle counting component of GraphChallenge.org tests the performance of graph processing systems to count all the triangles in a graph and exercises key graph operations found in many graph algorithms. In 2017, 2018, and 2019 many triangle counting submissions were received from a wide range of authors and organizations. This paper presents a performance analysis of the best performers of these submissions. These submissions show that their state-of-the-art triangle counting execution time, Ttri, is a strong function of the number of edges in the graph, Ne, which improved significantly from 2017 (Ttri \approx (Ne/10^8)^4=3) to 2018 (Ttri \approx Ne/10^9) and remained comparable from 2018 to 2019. Graph Challenge provides a clear picture of current graph analysis systems and underscores the need for new innovations to achieve high performance on very large graphs
READ LESS

Summary

The rise of graph analytic systems has created a need for new ways to measure and compare the capabilities of graph processing systems. The MIT/Amazon/IEEE Graph Challenge has been developed to provide a well-defined community venue for stimulating research and highlighting innovations in graph analysis software, hardware, algorithms, and systems...

READ MORE

GraphChallenge.org: raising the bar on graph analytic performance

Summary

The rise of graph analytic systems has created a need for new ways to measure and compare the capabilities of graph processing systems. The MIT/Amazon/IEEE Graph Challenge has been developed to provide a well-defined community venue for stimulating research and highlighting innovations in graph analysis software, hardware, algorithms, and systems. GraphChallenge.org provides a wide range of preparsed graph data sets, graph generators, mathematically defined graph algorithms, example serial implementations in a variety of languages, and specific metrics for measuring performance. Graph Challenge 2017 received 22 submissions by 111 authors from 36 organizations. The submissions highlighted graph analytic innovations in hardware, software, algorithms, systems, and visualization. These submissions produced many comparable performance measurements that can be used for assessing the current state of the art of the field. There were numerous submissions that implemented the triangle counting challenge and resulted in over 350 distinct measurements. Analysis of these submissions show that their execution time is a strong function of the number of edges in the graph, Ne, and is typically proportional to N4=3 e for large values of Ne. Combining the model fits of the submissions presents a picture of the current state of the art of graph analysis, which is typically 108 edges processed per second for graphs with 108 edges. These results are 30 times faster than serial implementations commonly used by many graph analysts and underscore the importance of making these performance benefits available to the broader community. Graph Challenge provides a clear picture of current graph analysis systems and underscores the need for new innovations to achieve high performance on very large graphs.
READ LESS

Summary

The rise of graph analytic systems has created a need for new ways to measure and compare the capabilities of graph processing systems. The MIT/Amazon/IEEE Graph Challenge has been developed to provide a well-defined community venue for stimulating research and highlighting innovations in graph analysis software, hardware, algorithms, and systems...

READ MORE

Influence estimation on social media networks using causal inference

Published in:
Proc. IEEE Statistical Signal Processing (SSP) Workshop, 10-13 June 2018.

Summary

Estimating influence on social media networks is an important practical and theoretical problem, especially because this new medium is widely exploited as a platform for disinformation and propaganda. This paper introduces a novel approach to influence estimation on social media networks and applies it to the real-world problem of characterizing active influence operations on Twitter during the 2017 French presidential elections. The new influence estimation approach attributes impact by accounting for narrative propagation over the network using a network causal inference framework applied to data arising from graph sampling and filtering. This causal framework infers the difference in outcome as a function of exposure, in contrast to existing approaches that attribute impact to activity volume or topological features, which do not explicitly measure nor necessarily indicate actual network influence. Cramér-Rao estimation bounds are derived for parameter estimation as a step in the causal analysis, and used to achieve geometrical insight on the causal inference problem. The ability to infer high causal influence is demonstrated on real-world social media accounts that are later independently confirmed to be either directly affiliated or correlated with foreign influence operations using evidence supplied by the U.S. Congress and journalistic reports.
READ LESS

Summary

Estimating influence on social media networks is an important practical and theoretical problem, especially because this new medium is widely exploited as a platform for disinformation and propaganda. This paper introduces a novel approach to influence estimation on social media networks and applies it to the real-world problem of characterizing...

READ MORE

Hybrid mixed-membership blockmodel for inference on realistic network interactions

Published in:
IEEE Trans. Netw. Sci. Eng., Vol. 6, No. 3, July-Sept. 2019.

Summary

This work proposes novel hybrid mixed-membership blockmodels (HMMB) that integrate three canonical network models to capture the characteristics of real-world interactions: community structure with mixed-membership, power-law-distributed node degrees, and sparsity. This hybrid model provides the capacity needed for realism, enabling control and inference on individual attributes of interest such as mixed-membership and popularity. A rigorous inference procedure is developed for estimating the parameters of this model through iterative Bayesian updates, with targeted initialization to improve identifiability. For the estimation of mixed-membership parameters, the Cramer-Rao bound is derived by quantifying the information content in terms of the Fisher information matrix. The effectiveness of the proposed inference is demonstrated in simulations where the estimates achieve covariances close to the Cramer-Rao bound while maintaining good truth coverage. We illustrate the utility of the proposed model and inference procedure in the application of detecting a community from a few cue nodes, where success depends on accurately estimating the mixed-memberships. Performance evaluations on both simulated and real-world data show that inference with HMMB is able to recover mixed-memberships in the presence of challenging community overlap, leading to significantly improved detection performance over algorithms based on network modularity and simpler models.
READ LESS

Summary

This work proposes novel hybrid mixed-membership blockmodels (HMMB) that integrate three canonical network models to capture the characteristics of real-world interactions: community structure with mixed-membership, power-law-distributed node degrees, and sparsity. This hybrid model provides the capacity needed for realism, enabling control and inference on individual attributes of interest such as...

READ MORE

Streaming graph challenge: stochastic block partition

Summary

An important objective for analyzing real-world graphs is to achieve scalable performance on large, streaming graphs. A challenging and relevant example is the graph partition problem. As a combinatorial problem, graph partition is NP-hard, but existing relaxation methods provide reasonable approximate solutions that can be scaled for large graphs. Competitive benchmarks and challenges have proven to be an effective means to advance state-of-the-art performance and foster community collaboration. This paper describes a graph partition challenge with a baseline partition algorithm of sub-quadratic complexity. The algorithm employs rigorous Bayesian inferential methods based on a statistical model that captures characteristics of the real-world graphs. This strong foundation enables the algorithm to address limitations of well-known graph partition approaches such as modularity maximization. This paper describes various aspects of the challenge including: (1) the data sets and streaming graph generator, (2) the baseline partition algorithm with pseudocode, (3) an argument for the correctness of parallelizing the Bayesian inference, (4) different parallel computation strategies such as node-based parallelism and matrix-based parallelism, (5) evaluation metrics for partition correctness and computational requirements, (6) preliminary timing of a Python-based demonstration code and the open source C++ code, and (7) considerations for partitioning the graph in streaming fashion. Data sets and source code for the algorithm as well as metrics, with detailed documentation are available at GraphChallenge.org.
READ LESS

Summary

An important objective for analyzing real-world graphs is to achieve scalable performance on large, streaming graphs. A challenging and relevant example is the graph partition problem. As a combinatorial problem, graph partition is NP-hard, but existing relaxation methods provide reasonable approximate solutions that can be scaled for large graphs. Competitive...

READ MORE

Static graph challenge: subgraph isomorphism

Summary

The rise of graph analytic systems has created a need for ways to measure and compare the capabilities of these systems. Graph analytics present unique scalability difficulties. The machine learning, high performance computing, and visual analytics communities have wrestled with these difficulties for decades and developed methodologies for creating challenges to move these communities forward. The proposed Subgraph Isomorphism Graph Challenge draws upon prior challenges from machine learning, high performance computing, and visual analytics to create a graph challenge that is reflective of many real-world graph analytics processing systems. The Subgraph Isomorphism Graph Challenge is a holistic specification with multiple integrated kernels that can be run together or independently. Each kernel is well defined mathematically and can be implemented in any programming environment. Subgraph isomorphism is amenable to both vertex-centric implementations and array-based implementations (e.g., using the Graph-BLAS.org standard). The computations are simple enough that performance predictions can be made based on simple computing hardware models. The surrounding kernels provide the context for each kernel that allows rigorous definition of both the input and the output for each kernel. Furthermore, since the proposed graph challenge is scalable in both problem size and hardware, it can be used to measure and quantitatively compare a wide range of present day and future systems. Serial implementations in C++, Python, Python with Pandas, Matlab, Octave, and Julia have been implemented and their single threaded performance have been measured. Specifications, data, and software are publicly available at GraphChallenge.org.
READ LESS

Summary

The rise of graph analytic systems has created a need for ways to measure and compare the capabilities of these systems. Graph analytics present unique scalability difficulties. The machine learning, high performance computing, and visual analytics communities have wrestled with these difficulties for decades and developed methodologies for creating challenges...

READ MORE

Causal inference under network interference: a framework for experiments on social networks

Author:
Published in:
Thesis (Ph.D.)--Harvard University, 2017.

Summary

No man is an island, as individuals interact and influence one another daily in our society. When social influence takes place in experiments on a population of interconnected individuals, the treatment on a unit may affect the outcomes of other units, a phenomenon known as interference. This thesis develops a causal framework and inference methodology for experiments where interference takes place on a network of influence (i.e. network interference). In this framework, the network potential outcomes serve as the key quantity and flexible building blocks for causal estimands that represent a variety of primary, peer, and total treatment effects. These causal estimands are estimated via principled Bayesian imputation of missing outcomes. The theory on the unconfoundedness assumptions leading to simplified imputation highlights the importance of including relevant network covariates in the potential outcome model. Additionally, experimental designs that result in balanced covariates and sizes across treatment exposure groups further improve the causal estimate, especially by mitigating potential outcome model mis-specification. The true potential outcome model is not typically known in real-world experiments, so the best practice is to account for interference and confounding network covariates through both balanced designs and model-based imputation. A full factorial simulated experiment is formulated to demonstrate this principle by comparing performance across different randomization schemes during the design phase and estimators during the analysis phase, under varying network topology and true potential outcome models. Overall, this thesis asserts that interference is not just a nuisance for analysis but rather an opportunity for quantifying and leveraging peer effects in real-world experiments.
READ LESS

Summary

No man is an island, as individuals interact and influence one another daily in our society. When social influence takes place in experiments on a population of interconnected individuals, the treatment on a unit may affect the outcomes of other units, a phenomenon known as interference. This thesis develops a...

READ MORE

Bayesian discovery of threat networks

Published in:
IEEE Trans. Signal Process., Vol. 62, No. 20, 15 October 2014, pp. 5324-38.

Summary

A novel unified Bayesian framework for network detection is developed, under which a detection algorithm is derived based on random walks on graphs. The algorithm detects threat networks using partial observations of their activity, and is proved to be optimum in the Neyman-Pearson sense. The algorithm is defined by a graph, at least one observation, and a diffusion model for threat. A link to well-known spectral detection methods is provided, and the equivalence of the random walk and harmonic solutions to the Bayesian formulation is proven. A general diffusion model is introduced that utilizes spatio-temporal relationships between vertices, and is used for a specific space-time formulation that leads to significant performance improvements on coordinated covert networks. This performance is demonstrated using a new hybrid mixed-membership blockmodel introduced to simulate random covert networks with realistic properties.
READ LESS

Summary

A novel unified Bayesian framework for network detection is developed, under which a detection algorithm is derived based on random walks on graphs. The algorithm detects threat networks using partial observations of their activity, and is proved to be optimum in the Neyman-Pearson sense. The algorithm is defined by a...

READ MORE

Estimation of Causal Peer Influence Effects

Author:
Published in:
International Conference on Machine Learning, 17-19 June 2013

Summary

The broad adoption of social media has generated interest in leveraging peer influence for inducing desired user behavior. Quantifying the causal effect of peer influence presents technical challenges, however, including how to deal with social interference, complex response functions and network uncertainty. In this paper, we extend potential outcomes to allow for interference, we introduce welldefined causal estimands of peer-influence, and we develop two estimation procedures: a frequentist procedure relying on a sequential randomization design that requires knowledge of the network but operates under complicated response functions, and a Bayesian procedure which accounts for network uncertainty but relies on a linear response assumption to increase estimation precision. Our results show the advantages and disadvantages of the proposed methods in a number of situations.
READ LESS

Summary

The broad adoption of social media has generated interest in leveraging peer influence for inducing desired user behavior. Quantifying the causal effect of peer influence presents technical challenges, however, including how to deal with social interference, complex response functions and network uncertainty. In this paper, we extend potential outcomes to...

READ MORE

Showing Results

1-9 of 9