Publications

Refine Results

(Filters Applied) Clear All

From NoSQL Accumulo to NewSQL Graphulo: design and utility of graph algorithms inside a BigTable database

Published in:
HPEC 2016: IEEE Conf. on High Performance Extreme Computing, 13-15 September 2016.

Summary

Google BigTable's scale-out design for distributed key-value storage inspired a generation of NoSQL databases. Recently the NewSQL paradigm emerged in response to analytic workloads that demand distributed computation local to data storage. Many such analytics take the form of graph algorithms, a trend that motivated the GraphBLAS initiative to standardize a set of matrix math kernels for building graph algorithms. In this article we show how it is possible to implement the GraphBLAS kernels in a BigTable database by presenting the design of Graphulo, a library for executing graph algorithms inside the Apache Accumulo database. We detail the Graphulo implementation of two graph algorithms and conduct experiments comparing their performance to two main-memory matrix math systems. Our results shed insight into the conditions that determine when executing a graph algorithm is faster inside a database versus an external system—in short, that memory requirements and relative I/O are critical factors.
READ LESS

Summary

Google BigTable's scale-out design for distributed key-value storage inspired a generation of NoSQL databases. Recently the NewSQL paradigm emerged in response to analytic workloads that demand distributed computation local to data storage. Many such analytics take the form of graph algorithms, a trend that motivated the GraphBLAS initiative to standardize...

READ MORE

Julia implementation of the Dynamic Distributed Dimensional Data Model

Published in:
HPEC 2016: IEEE Conf. on High Performance Extreme Computing, 13-15 September 2016.

Summary

Julia is a new language for writing data analysis programs that are easy to implement and run at high performance. Similarly, the Dynamic Distributed Dimensional Data Model (D4M) aims to clarify data analysis operations while retaining strong performance. D4M accomplishes these goals through a composable, unified data model on associative arrays. In this work, we present an implementation of D4M in Julia and describe how it enables and facilitates data analysis. Several experiments showcase scalable performance in our new Julia version as compared to the original Matlab implementation.
READ LESS

Summary

Julia is a new language for writing data analysis programs that are easy to implement and run at high performance. Similarly, the Dynamic Distributed Dimensional Data Model (D4M) aims to clarify data analysis operations while retaining strong performance. D4M accomplishes these goals through a composable, unified data model on associative...

READ MORE

Taming biological big data with D4M

Published in:
Lincoln Laboratory Journal, Vol. 20, No. 1, 2013, pp. 82-91.

Summary

The supercomputing community has taken up the challenge of "taming the beast" spawned by the massive amount of data available in the bioinformatics domain: How can these data be exploited faster and better? MIT Lincoln Laboratory computer scientists demonstrated how a new Laboratory-developed technology, the Dynamic Distributed Dimensional Data Model (D4M), can be used to accelerate DNA sequence comparison, a core operation in bioinformatics.
READ LESS

Summary

The supercomputing community has taken up the challenge of "taming the beast" spawned by the massive amount of data available in the bioinformatics domain: How can these data be exploited faster and better? MIT Lincoln Laboratory computer scientists demonstrated how a new Laboratory-developed technology, the Dynamic Distributed Dimensional Data Model...

READ MORE

Taming biological big data with D4M

Summary

The supercomputing community has taken up the challenge of "taming the beast" spawned by the massive amount of data available in the bioinformatics domain: How can these data be exploited faster and better? MIT Lincoln Laboratory computer scientists demonstrated how a new Laboratory-developed technology, the Dynamic Distributed Dimensional Data Model (D4M), can be used to accelerate DNA sequence comparison, a core operation in bioinformatics.
READ LESS

Summary

The supercomputing community has taken up the challenge of "taming the beast" spawned by the massive amount of data available in the bioinformatics domain: How can these data be exploited faster and better? MIT Lincoln Laboratory computer scientists demonstrated how a new Laboratory-developed technology, the Dynamic Distributed Dimensional Data Model...

READ MORE

Showing Results

1-4 of 4