Sparse matrix partitioning for parallel eigenanalysis of large static and dynamic graphs

September 9, 2014

Conference Paper

Author:

Michael M. Wolf

…

Benjamin A. Miller

Published in:

HPEC 2014: IEEE Conf. on High Performance Extreme Computing, 9-11 September 2014.

R&D Area:

Cyber Security and Information Sciences

R&D Group:

Artificial Intelligence Technology and Systems

Sparse matrix partitioning for parallel eigenanalysis of large static and dynamic graphs

Summary

Numerous applications focus on the analysis of entities and the connections between them, and such data are naturally represented as graphs. In particular, the detection of a small subset of vertices with anomalous coordinated connectivity is of broad interest, for problems such as detecting strange traffic in a computer network or unknown communities in a social network. These problems become more difficult as the background graph grows larger and noisier and the coordination patterns become more subtle. In this paper, we discuss the computational challenges of a statistical framework designed to address this cross-mission challenge. The statistical framework is based on spectral analysis of the graph data, and three partitioning methods are evaluated for computing the principal eigenvector of the graph's residuals matrix. While a standard one-dimensional partitioning technique enables this computation for up to four billion vertices, the communication overhead prevents this method from being used for even larger graphs. Recent two-dimensional partitioning methods are shown to have much more favorable scaling properties. A data-dependent partitioning method, which has the best scaling performance, is also shown to improve computation time even as a graph changes over time, allowing amortization of the upfront cost.