Publications

Refine Results

(Filters Applied) Clear All

GraphChallenge.org triangle counting performance [e-print]

Summary

The rise of graph analytic systems has created a need for new ways to measure and compare the capabilities of graph processing systems. The MIT/Amazon/IEEE Graph Challenge has been developed to provide a well-defined community venue for stimulating research and highlighting innovations in graph analysis software, hardware, algorithms, and systems. GraphChallenge.org provides a wide range of preparsed graph data sets, graph generators, mathematically defined graph algorithms, example serial implementations in a variety of languages, and specific metrics for measuring performance. The triangle counting component of GraphChallenge.org tests the performance of graph processing systems to count all the triangles in a graph and exercises key graph operations found in many graph algorithms. In 2017, 2018, and 2019 many triangle counting submissions were received from a wide range of authors and organizations. This paper presents a performance analysis of the best performers of these submissions. These submissions show that their state-of-the-art triangle counting execution time, Ttri, is a strong function of the number of edges in the graph, Ne, which improved significantly from 2017 (Ttri \approx (Ne/10^8)^4=3) to 2018 (Ttri \approx Ne/10^9) and remained comparable from 2018 to 2019. Graph Challenge provides a clear picture of current graph analysis systems and underscores the need for new innovations to achieve high performance on very large graphs
READ LESS

Summary

The rise of graph analytic systems has created a need for new ways to measure and compare the capabilities of graph processing systems. The MIT/Amazon/IEEE Graph Challenge has been developed to provide a well-defined community venue for stimulating research and highlighting innovations in graph analysis software, hardware, algorithms, and systems...

READ MORE

Leveraging linear algebra to count and enumerate simple subgraphs

Published in:
2020 IEEE High Performance Extreme Computing Conf., HPEC, 22-24 September 2020.

Summary

Even though subgraph counting and subgraph matching are well-known NP-Hard problems, they are foundational building blocks for many scientific and commercial applications. In order to analyze graphs that contain millions to billions of edges, distributed systems can provide computational scalability through search parallelization. One recent approach for exposing graph algorithm parallelization is through a linear algebra formulation and the use of the matrix multiply operation, which conceptually is equivalent to a massively parallel graph traversal. This approach has several benefits, including 1) a mathematically-rigorous foundation, and 2) ability to leverage specialized linear algebra accelerators and high-performance libraries. In this paper, we explore and define a linear algebra methodology for performing exact subgraph counting and matching for 4-vertex subgraphs excluding the clique. Matches on these simple subgraphs can be joined as components for a larger subgraph. With thorough analysis, we demonstrate that the linear algebra formulation leverages path aggregation which allows it to be up 2x to 5x more efficient in traversing the search space and compressing the results as compared to tree-based subgraph matching techniques.
READ LESS

Summary

Even though subgraph counting and subgraph matching are well-known NP-Hard problems, they are foundational building blocks for many scientific and commercial applications. In order to analyze graphs that contain millions to billions of edges, distributed systems can provide computational scalability through search parallelization. One recent approach for exposing graph algorithm...

READ MORE

A hardware root-of-trust design for low-power SoC edge devices

Published in:
2020 IEEE High Performance Extreme Computing Conf., HPEC, 22-24 September 2020.

Summary

In this work, we introduce a hardware root-of-trust architecture for low-power edge devices. An accelerator-based SoC design that includes the hardware root-of-trust architecture is developed. An example application for the device is presented. We examine attacks based on physical access given the significant threat they pose to unattended edge systems. The hardware root-of-trust provides security features to ensure the integrity of the SoC execution environment when deployed in uncontrolled, unattended locations. E-fused boot memory ensures the boot code and other security critical software is not compromised after deployment. Digitally signed programmable instruction memory prevents execution of code from untrusted sources. A programmable finite state machine is used to enforce access policies to device resources even if the application software on the device is compromised. Access policies isolate the execution states of application and security-critical software. The hardware root-of-trust architecture saves energy with a lower hardware overhead than a separate secure enclave while eliminating software attack surfaces for access control policies.
READ LESS

Summary

In this work, we introduce a hardware root-of-trust architecture for low-power edge devices. An accelerator-based SoC design that includes the hardware root-of-trust architecture is developed. An example application for the device is presented. We examine attacks based on physical access given the significant threat they pose to unattended edge systems...

READ MORE

GraphChallenge.org: raising the bar on graph analytic performance

Summary

The rise of graph analytic systems has created a need for new ways to measure and compare the capabilities of graph processing systems. The MIT/Amazon/IEEE Graph Challenge has been developed to provide a well-defined community venue for stimulating research and highlighting innovations in graph analysis software, hardware, algorithms, and systems. GraphChallenge.org provides a wide range of preparsed graph data sets, graph generators, mathematically defined graph algorithms, example serial implementations in a variety of languages, and specific metrics for measuring performance. Graph Challenge 2017 received 22 submissions by 111 authors from 36 organizations. The submissions highlighted graph analytic innovations in hardware, software, algorithms, systems, and visualization. These submissions produced many comparable performance measurements that can be used for assessing the current state of the art of the field. There were numerous submissions that implemented the triangle counting challenge and resulted in over 350 distinct measurements. Analysis of these submissions show that their execution time is a strong function of the number of edges in the graph, Ne, and is typically proportional to N4=3 e for large values of Ne. Combining the model fits of the submissions presents a picture of the current state of the art of graph analysis, which is typically 108 edges processed per second for graphs with 108 edges. These results are 30 times faster than serial implementations commonly used by many graph analysts and underscore the importance of making these performance benefits available to the broader community. Graph Challenge provides a clear picture of current graph analysis systems and underscores the need for new innovations to achieve high performance on very large graphs.
READ LESS

Summary

The rise of graph analytic systems has created a need for new ways to measure and compare the capabilities of graph processing systems. The MIT/Amazon/IEEE Graph Challenge has been developed to provide a well-defined community venue for stimulating research and highlighting innovations in graph analysis software, hardware, algorithms, and systems...

READ MORE

Cloud computing in tactical environments

Summary

Ground personnel at the tactical edge often lack data and analytics that would increase their effectiveness. To address this problem, this work investigates methods to deploy cloud computing capabilities in tactical environments. Our approach is to identify representative applications and to design a system that spans the software/hardware stack to support such applications while optimizing the use of scarce resources. This paper presents our high-level design and the results of initial experiments that indicate the validity of our approach.
READ LESS

Summary

Ground personnel at the tactical edge often lack data and analytics that would increase their effectiveness. To address this problem, this work investigates methods to deploy cloud computing capabilities in tactical environments. Our approach is to identify representative applications and to design a system that spans the software/hardware stack to...

READ MORE

Streaming graph challenge: stochastic block partition

Summary

An important objective for analyzing real-world graphs is to achieve scalable performance on large, streaming graphs. A challenging and relevant example is the graph partition problem. As a combinatorial problem, graph partition is NP-hard, but existing relaxation methods provide reasonable approximate solutions that can be scaled for large graphs. Competitive benchmarks and challenges have proven to be an effective means to advance state-of-the-art performance and foster community collaboration. This paper describes a graph partition challenge with a baseline partition algorithm of sub-quadratic complexity. The algorithm employs rigorous Bayesian inferential methods based on a statistical model that captures characteristics of the real-world graphs. This strong foundation enables the algorithm to address limitations of well-known graph partition approaches such as modularity maximization. This paper describes various aspects of the challenge including: (1) the data sets and streaming graph generator, (2) the baseline partition algorithm with pseudocode, (3) an argument for the correctness of parallelizing the Bayesian inference, (4) different parallel computation strategies such as node-based parallelism and matrix-based parallelism, (5) evaluation metrics for partition correctness and computational requirements, (6) preliminary timing of a Python-based demonstration code and the open source C++ code, and (7) considerations for partitioning the graph in streaming fashion. Data sets and source code for the algorithm as well as metrics, with detailed documentation are available at GraphChallenge.org.
READ LESS

Summary

An important objective for analyzing real-world graphs is to achieve scalable performance on large, streaming graphs. A challenging and relevant example is the graph partition problem. As a combinatorial problem, graph partition is NP-hard, but existing relaxation methods provide reasonable approximate solutions that can be scaled for large graphs. Competitive...

READ MORE

Static graph challenge: subgraph isomorphism

Summary

The rise of graph analytic systems has created a need for ways to measure and compare the capabilities of these systems. Graph analytics present unique scalability difficulties. The machine learning, high performance computing, and visual analytics communities have wrestled with these difficulties for decades and developed methodologies for creating challenges to move these communities forward. The proposed Subgraph Isomorphism Graph Challenge draws upon prior challenges from machine learning, high performance computing, and visual analytics to create a graph challenge that is reflective of many real-world graph analytics processing systems. The Subgraph Isomorphism Graph Challenge is a holistic specification with multiple integrated kernels that can be run together or independently. Each kernel is well defined mathematically and can be implemented in any programming environment. Subgraph isomorphism is amenable to both vertex-centric implementations and array-based implementations (e.g., using the Graph-BLAS.org standard). The computations are simple enough that performance predictions can be made based on simple computing hardware models. The surrounding kernels provide the context for each kernel that allows rigorous definition of both the input and the output for each kernel. Furthermore, since the proposed graph challenge is scalable in both problem size and hardware, it can be used to measure and quantitatively compare a wide range of present day and future systems. Serial implementations in C++, Python, Python with Pandas, Matlab, Octave, and Julia have been implemented and their single threaded performance have been measured. Specifications, data, and software are publicly available at GraphChallenge.org.
READ LESS

Summary

The rise of graph analytic systems has created a need for ways to measure and compare the capabilities of these systems. Graph analytics present unique scalability difficulties. The machine learning, high performance computing, and visual analytics communities have wrestled with these difficulties for decades and developed methodologies for creating challenges...

READ MORE

Performance measurements of supercomputing and cloud storage solutions

Summary

Increasing amounts of data from varied sources, particularly in the fields of machine learning and graph analytics, are causing storage requirements to grow rapidly. A variety of technologies exist for storing and sharing these data, ranging from parallel file systems used by supercomputers to distributed block storage systems found in clouds. Relatively few comparative measurements exist to inform decisions about which storage systems are best suited for particular tasks. This work provides these measurements for two of the most popular storage technologies: Lustre and Amazon S3. Lustre is an opensource, high performance, parallel file system used by many of the largest supercomputers in the world. Amazon's Simple Storage Service, or S3, is part of the Amazon Web Services offering, and offers a scalable, distributed option to store and retrieve data from anywhere on the Internet. Parallel processing is essential for achieving high performance on modern storage systems. The performance tests used span the gamut of parallel I/O scenarios, ranging from single-client, single-node Amazon S3 and Lustre performance to a large-scale, multi-client test designed to demonstrate the capabilities of a modern storage appliance under heavy load. These results show that, when parallel I/O is used correctly (i.e., many simultaneous read or write processes), full network bandwidth performance is achievable and ranged from 10 gigabits/s over a 10 GigE S3 connection to 0.35 terabits/s using Lustre on a 1200 port 10 GigE switch. These results demonstrate that S3 is well-suited to sharing vast quantities of data over the Internet, while Lustre is well-suited to processing large quantities of data locally.
READ LESS

Summary

Increasing amounts of data from varied sources, particularly in the fields of machine learning and graph analytics, are causing storage requirements to grow rapidly. A variety of technologies exist for storing and sharing these data, ranging from parallel file systems used by supercomputers to distributed block storage systems found in...

READ MORE

Fabrication security and trust of domain-specific ASIC processors

Summary

Application specific integrated circuits (ASICs) are commonly used to implement high-performance signal-processing systems for high-volume applications, but their high development costs and inflexible nature make ASICs inappropriate for algorithm development and low-volume DoD applications. In addition, the intellectual property (IP) embedded in the ASIC is at risk when fabricated in an untrusted foundry. Lincoln Laboratory has developed a flexible signal-processing architecture to implement a wide range of algorithms within one application domain, for example radar signal processing. In this design methodology, common signal processing kernels such as digital filters, fast Fourier transforms (FFTs), and matrix transformations are implemented as optimized modules, which are interconnected by a programmable wiring fabric that is similar to the interconnect in a field programmable gate array (FPGA). One or more programmable microcontrollers are also embedded in the fabric to sequence the operations. This design methodology, which has been termed a coarse-grained FPGA, has been shown to achieve a near ASIC level of performance. In addition, since the signal processing algorithms are expressed in firmware that is loaded at runtime, the important application details are protected from an unscrupulous foundry.
READ LESS

Summary

Application specific integrated circuits (ASICs) are commonly used to implement high-performance signal-processing systems for high-volume applications, but their high development costs and inflexible nature make ASICs inappropriate for algorithm development and low-volume DoD applications. In addition, the intellectual property (IP) embedded in the ASIC is at risk when fabricated in...

READ MORE

Development of a high-throughput microwave imaging system for concealed weapons detection

Summary

A video-rate microwave imaging aperture for concealed threat detection can serve as a useful tool in securing crowded, high foot traffic environments. Realization of such a system presents two major technical challenges: 1) implementation of an electrically large antenna array for capture of a moving subject, and 2) fast image reconstruction on cost-effective computing hardware. This paper presents a hardware-efficient multistatic array design to address the former challenge, and a compatible fast imaging technique to address the latter. Prototype hardware which forms a partition of an imaging aperture is discussed. Using this hardware, it is shown that the proposed array design can be used to form high-fidelity 3D images, and that the presented image reconstruction technique can form an image of a human-sized domain in ≤ 0.1s with low cost computing hardware.
READ LESS

Summary

A video-rate microwave imaging aperture for concealed threat detection can serve as a useful tool in securing crowded, high foot traffic environments. Realization of such a system presents two major technical challenges: 1) implementation of an electrically large antenna array for capture of a moving subject, and 2) fast image...

READ MORE