Publications

Refine Results

(Filters Applied) Clear All

75,000,000,000 streaming inserts/second using hierarchical hypersparse GraphBLAS matrices

Summary

The SuiteSparse GraphBLAS C-library implements high performance hypersparse matrices with bindings to a variety of languages (Python, Julia, and Matlab/Octave). GraphBLAS provides a lightweight in-memory database implementation of hypersparse matrices that are ideal for analyzing many types of network data, while providing rigorous mathematical guarantees, such as linearity. Streaming updates of hypersparse matrices put enormous pressure on the memory hierarchy. This work benchmarks an implementation of hierarchical hypersparse matrices that reduces memory pressure and dramatically increases the update rate into a hypersparse matrices. The parameters of hierarchical hypersparse matrices rely on controlling the number of entries in each level in the hierarchy before an update is cascaded. The parameters are easily tunable to achieve optimal performance for a variety of applications. Hierarchical hypersparse matrices achieve over 1,000,000 updates per second in a single instance. Scaling to 31,000 instances of hierarchical hypersparse matrices arrays on 1,100 server nodes on the MIT SuperCloud achieved a sustained update rate of 75,000,000,000 updates per second. This capability allows the MIT SuperCloud to analyze extremely large streaming network data sets.
READ LESS

Summary

The SuiteSparse GraphBLAS C-library implements high performance hypersparse matrices with bindings to a variety of languages (Python, Julia, and Matlab/Octave). GraphBLAS provides a lightweight in-memory database implementation of hypersparse matrices that are ideal for analyzing many types of network data, while providing rigorous mathematical guarantees, such as linearity. Streaming updates...

READ MORE

Large scale parallelization using file-based communications

Summary

In this paper, we present a novel and new file-based communication architecture using the local filesystem for large scale parallelization. This new approach eliminates the issues with filesystem overload and resource contention when using the central filesystem for large parallel jobs. The new approach incurs additional overhead due to inter-node message file transfers when both the sending and receiving processes are not on the same node. However, even with this additional overhead cost, its benefits are far greater for the overall cluster operation in addition to the performance enhancement in message communications for large scale parallel jobs. For example, when running a 2048-process parallel job, it achieved about 34 times better performance with MPI_Bcast() when using the local filesystem. Furthermore, since the security for transferring message files is handled entirely by using the secure copy protocol (scp) and the file system permissions, no additional security measures or ports are required other than those that are typically required on an HPC system.
READ LESS

Summary

In this paper, we present a novel and new file-based communication architecture using the local filesystem for large scale parallelization. This new approach eliminates the issues with filesystem overload and resource contention when using the central filesystem for large parallel jobs. The new approach incurs additional overhead due to inter-node...

READ MORE

FastDAWG: improving data migration in the BigDAWG polystore system

Published in:
Poly 2018/DMAH 2018, LNCS 11470, 2019, pp. 3–15.

Summary

The problem of data integration has been around for decades, yet a satisfactory solution has not yet emerged. A new type of system called a polystore has surfaced to partially address the integration problem. Based on experience with our own polystore called Big-DAWG, we identify three major roadblocks to an acceptable commercial solution. We offer a new architecture inspired by these three problems that trades some generality for usability. This architecture also exploits modern hardware (i.e., high-speed networks and RDMA) to gain performance. The paper concludes with some promising experimental results.
READ LESS

Summary

The problem of data integration has been around for decades, yet a satisfactory solution has not yet emerged. A new type of system called a polystore has surfaced to partially address the integration problem. Based on experience with our own polystore called Big-DAWG, we identify three major roadblocks to an...

READ MORE

Scaling big data platform for big data pipeline

Published in:
Submitted to Northeast Database Day, NEBD 2020, https://arxiv.org/abs/1902.03948

Summary

Monitoring and Managing High Performance Computing (HPC) systems and environments generate an ever growing amount of data. Making sense of this data and generating a platform where the data can be visualized for system administrators and management to proactively identify system failures or understand the state of the system requires the platform to be as efficient and scalable as the underlying database tools used to store and analyze the data. In this paper we will show how we leverage Accumulo, d4m, and Unity to generate a 3D visualization platform to monitor and manage the Lincoln Laboratory Supercomputer systems and how we have had to retool our approach to scale with our systems.
READ LESS

Summary

Monitoring and Managing High Performance Computing (HPC) systems and environments generate an ever growing amount of data. Making sense of this data and generating a platform where the data can be visualized for system administrators and management to proactively identify system failures or understand the state of the system requires...

READ MORE

A billion updates per second using 30,000 hierarchical in-memory D4M databases

Summary

Analyzing large scale networks requires high performance streaming updates of graph representations of these data. Associative arrays are mathematical objects combining properties of spreadsheets, databases, matrices, and graphs, and are well-suited for representing and analyzing streaming network data. The Dynamic Distributed Dimensional Data Model (D4M) library implements associative arrays in a variety of languages (Python, Julia, and Matlab/Octave) and provides a lightweight in-memory database. Associative arrays are designed for block updates. Streaming updates to a large associative array requires a hierarchical implementation to optimize the performance of the memory hierarchy. Running 34,000 instances of a hierarchical D4M associative arrays on 1,100 server nodes on the MIT SuperCloud achieved a sustained update rate of 1,900,000,000 updates per second. This capability allows the MIT SuperCloud to analyze extremely large streaming network data sets.
READ LESS

Summary

Analyzing large scale networks requires high performance streaming updates of graph representations of these data. Associative arrays are mathematical objects combining properties of spreadsheets, databases, matrices, and graphs, and are well-suited for representing and analyzing streaming network data. The Dynamic Distributed Dimensional Data Model (D4M) library implements associative arrays in...

READ MORE

Hyperscaling internet graph analysis with D4M on the MIT SuperCloud

Summary

Detecting anomalous behavior in network traffic is a major challenge due to the volume and velocity of network traffic. For example, a 10 Gigabit Ethernet connection can generate over 50 MB/s of packet headers. For global network providers, this challenge can be amplified by many orders of magnitude. Development of novel computer network traffic analytics requires: high level programming environments, massive amount of packet capture (PCAP) data, and diverse data products for "at scale" algorithm pipeline development. D4M (Dynamic Distributed Dimensional Data Model) combines the power of sparse linear algebra, associative arrays, parallel processing, and distributed databases (such as SciDB and Apache Accumulo) to provide a scalable data and computation system that addresses the big data problems associated with network analytics development. Combining D4M with the MIT SuperCloud manycore processors and parallel storage system enables network analysts to interactively process massive amounts of data in minutes. To demonstrate these capabilities, we have implemented a representative analytics pipeline in D4M and benchmarked it on 96 hours of Gigabit PCAP data with MIT SuperCloud. The entire pipeline from uncompressing the raw files to database ingest was implemented in 135 lines of D4M code and achieved speedups of over 20,000.
READ LESS

Summary

Detecting anomalous behavior in network traffic is a major challenge due to the volume and velocity of network traffic. For example, a 10 Gigabit Ethernet connection can generate over 50 MB/s of packet headers. For global network providers, this challenge can be amplified by many orders of magnitude. Development of...

READ MORE

TabulaROSA: tabular operating system architecture for massively parallel heterogeneous compute engines

Summary

The rise in computing hardware choices is driving a reevaluation of operating systems. The traditional role of an operating system controlling the execution of its own hardware is evolving toward a model whereby the controlling processor is distinct from the compute engines that are performing most of the computations. In this context, an operating system can be viewed as software that brokers and tracks the resources of the compute engines and is akin to a database management system. To explore the idea of using a database in an operating system role, this work defines key operating system functions in terms of rigorous mathematical semantics (associative array algebra) that are directly translatable into database operations. These operations possess a number of mathematical properties that are ideal for parallel operating systems by guaranteeing correctness over a wide range of parallel operations. The resulting operating system equations provide a mathematical specification for a Tabular Operating System Architecture (TabulaROSA) that can be implemented on any platform. Simulations of forking in TabularROSA are performed using an associative array implementation and compared to Linux on a 32,000+ core supercomputer. Using over 262,000 forkers managing over 68,000,000,000 processes, the simulations show that TabulaROSA has the potential to perform operating system functions on a massively parallel scale. The TabulaROSA simulations show 20x higher performance as compared to Linux while managing 2000x more processes in fully searchable tables.
READ LESS

Summary

The rise in computing hardware choices is driving a reevaluation of operating systems. The traditional role of an operating system controlling the execution of its own hardware is evolving toward a model whereby the controlling processor is distinct from the compute engines that are performing most of the computations. In...

READ MORE

Simulation approach to sensor placement using Unity3D

Summary

3D game simulation engines have demonstrated utility in the areas of training, scientific analysis, and knowledge solicitation. This paper will make the case for the use of 3D game simulation engines in the field of sensor placement optimization. Our study used a series of parallel simulations in the Unity3D simulation framework to answer the questions: how many sensors of various modalities are required and where they should be placed to meet a desired threat detection threshold? The result is a framework that not only answers this sensor placement question, but can be easily expanded to differing optimization criteria as well as answer how a particular configuration responds to differing crowd flows or informed/non-informed adversaries. Additionally, we demonstrate the scalability of this framework by running parallel instances on a supercomputing grid and illustrate the processing speed gained.
READ LESS

Summary

3D game simulation engines have demonstrated utility in the areas of training, scientific analysis, and knowledge solicitation. This paper will make the case for the use of 3D game simulation engines in the field of sensor placement optimization. Our study used a series of parallel simulations in the Unity3D simulation...

READ MORE

Colorization of H&E stained tissue using deep learning

Published in:
40th Int. Conf. of the IEEE Engineering in Medicine and Biology Society, EMBC, 17-21 July 2018.

Summary

Histopathology is a critical tool in the diagnosis and stratification of cancer. Digital Pathology involves the scanning of stained and fixed tissue samples to produce high-resolution images that can be used for computer-aided diagnosis and research. A common challenge in digital pathology related to the quality and characteristics of staining, which can vary widely from center to center and also within the same institution depending on the age of the stain and other human factors. In this paper we examine the use of deep learning models for colorizing H&E stained tissue images and compare the results with traditional image processing/statistical approaches that have been developed for standardizing or normalizing histopathology images. We adapt existing deep learning models that have been developed for colorizing natural images and compare the results with models developed specifically for digital pathology. Our results show that deep learning approaches can standardize the colorization of H&E images. The performance as measured by the chi-square statistic shows that the deep learning approach can be nearly as good as current state-of-the art normalization methods.
READ LESS

Summary

Histopathology is a critical tool in the diagnosis and stratification of cancer. Digital Pathology involves the scanning of stained and fixed tissue samples to produce high-resolution images that can be used for computer-aided diagnosis and research. A common challenge in digital pathology related to the quality and characteristics of staining...

READ MORE

On large-scale graph generation with validation of diverse triangle statistics at edges and vertices

Published in:
2018 IEEE International Parallel and Distributed Processing Symposium Workshops, IPDPSW, 21 May 2018.

Summary

Researchers developing implementations of distributed graph analytic algorithms require graph generators that yield graphs sharing the challenging characteristics of real-world graphs (small-world, scale-free, heavy-tailed degree distribution) with efficiently calculable ground-truth solutions to the desired output. Reproducibility for current generators used in benchmarking are somewhat lacking in this respect due to their randomness: the output of a desired graph analytic can only be compared to expected values and not exact ground truth. Nonstochastic Kronecker product graphs meet these design criteria for several graph analytics. Here we show that many flavors of triangle participation can be cheaply calculated while generating a Kronecker product graph.
READ LESS

Summary

Researchers developing implementations of distributed graph analytic algorithms require graph generators that yield graphs sharing the challenging characteristics of real-world graphs (small-world, scale-free, heavy-tailed degree distribution) with efficiently calculable ground-truth solutions to the desired output. Reproducibility for current generators used in benchmarking are somewhat lacking in this respect due to...

READ MORE