Publications

Refine Results

(Filters Applied) Clear All

R&D Areas

R&D Groups

Year

Items per page

By

Charles Yee Clear filter

Driving big data with big compute

September 10, 2012

Conference Paper

Author:

Chansup Byun

…

Published in:

HPEC 2012: IEEE Conf. on High Performance Extreme Computing, 10-12 September 2012.

Topic:

high performance computing

R&D area:

Cyber Security and Information Sciences

R&D group:

Summary

Big Data (as embodied by Hadoop clusters) and Big Compute (as embodied by MPI clusters) provide unique capabilities for storing and processing large volumes of data. Hadoop clusters make distributed computing readily accessible to the Java community and MPI clusters provide high parallel efficiency for compute intensive workloads. Bringing the big data and big compute communities together is an active area of research. The LLGrid team has developed and deployed a number of technologies that aim to provide the best of both worlds. LLGrid MapReduce allows the map/reduce parallel programming model to be used quickly and efficiently in any language on any compute cluster. D4M (Dynamic Distributed Dimensional Data Model) provided a high level distributed arrays interface to the Apache Accumulo database. The accessibility of these technologies is assessed by measuring the effort to use these tools and is typically a few lines of code. The performance is assessed by measuring the insert rate into the Accumulo database. Using these tools a database insert rate of 4M inserts/second has been achieved on an 8 node cluster.

READ LESS

Summary

Driving big data with big compute

September 10, 2012

Conference Paper

Author:

Chansup Byun

…

Published in:

HPEC 2012: IEEE Conf. on High Performance Extreme Computing, 10-12 September 2012.

Topic:

supercomputing

R&D area:

Cyber Security and Information Sciences

R&D group:

Embedded and AI Systems

Summary

READ LESS

Summary

Driving big data with big compute

Dynamic Distributed Dimensional Data Model (D4M) database and computation system

March 25, 2012

Conference Paper

Author:

Jeremy Kepner

…

Published in:

ICASSP 2012, Proc. IEEE Int. Conf. on Acoustics, Speech and Signal Processing, 25-30 March 2012, pp. 5349-52.

Topic:

big data

R&D area:

R&D group:

Summary

A crucial element of large web companies is their ability to collect and analyze massive amounts of data. Tuple store databases are a key enabling technology employed by many of these companies (e.g., Google Big Table and Amazon Dynamo). Tuple stores are highly scalable and run on commodity clusters, but lack interfaces to support efficient development of mathematically based analytics. D4M (Dynamic Distributed Dimensional Data Model) has been developed to provide a mathematically rich interface to tuple stores (and structured query language "SQL" databases). D4M allows linear algebra to be readily applied to databases. Using D4M, it is possible to create composable analytics with significantly less effort than using traditional approaches. This work describes the D4M technology and its application and performance.

READ LESS

Summary

Dynamic Distributed Dimensional Data Model (D4M) database and computation system

Publications

Refine Results

By

Driving big data with big compute

Summary

Summary

Driving big data with big compute

Summary

Summary

Dynamic Distributed Dimensional Data Model (D4M) database and computation system

Summary

Summary

Showing Results