Publications

Refine Results

(Filters Applied) Clear All

In-storage embedded accelerator for sparse pattern processing

Published in:
HPEC 2016: IEEE Conf. on High Performance Extreme Computing, 13-15 September 2016.

Summary

We present a novel architecture for sparse pattern processing, using flash storage with embedded accelerators. Sparse pattern processing on large data sets is the essence of applications such as document search, natural language processing, bioinformatics, subgraph matching, machine learning, and graph processing. One slice of our prototype accelerator is capable of handling up to 1TB of data, and experiments show that it can outperform C/C++ software solutions on a 16-core system at a fraction of the power and cost; an optimized version of the accelerator can match the performance of a 48-core server.
READ LESS

Summary

We present a novel architecture for sparse pattern processing, using flash storage with embedded accelerators. Sparse pattern processing on large data sets is the essence of applications such as document search, natural language processing, bioinformatics, subgraph matching, machine learning, and graph processing. One slice of our prototype accelerator is capable...

READ MORE

Novel graph processor architecture, prototype system, and results

Published in:
HPEC 2016: IEEE Conf. on High Performance Extreme Computing, 13-15 September 2016.

Summary

Graph algorithms are increasingly used in applications that exploit large databases. However, conventional processor architectures are inadequate for handling the throughput and memory requirements of graph computation. Lincoln Laboratory's graph-processor architecture represents a rethinking of parallel architectures for graph problems. Our processor utilizes innovations that include a sparse matrix-based graph instruction set, a cacheless memory system, accelerator-based architecture, a systolic sorter, high-bandwidth multidimensional toroidal communication network, and randomized communications. A field-programmable gate array (FPGA) prototype of the new graph processor has been developed with significant performance enhancement over conventional processors in graph computational throughput.
READ LESS

Summary

Graph algorithms are increasingly used in applications that exploit large databases. However, conventional processor architectures are inadequate for handling the throughput and memory requirements of graph computation. Lincoln Laboratory's graph-processor architecture represents a rethinking of parallel architectures for graph problems. Our processor utilizes innovations that include a sparse matrix-based graph...

READ MORE

From NoSQL Accumulo to NewSQL Graphulo: design and utility of graph algorithms inside a BigTable database

Published in:
HPEC 2016: IEEE Conf. on High Performance Extreme Computing, 13-15 September 2016.

Summary

Google BigTable's scale-out design for distributed key-value storage inspired a generation of NoSQL databases. Recently the NewSQL paradigm emerged in response to analytic workloads that demand distributed computation local to data storage. Many such analytics take the form of graph algorithms, a trend that motivated the GraphBLAS initiative to standardize a set of matrix math kernels for building graph algorithms. In this article we show how it is possible to implement the GraphBLAS kernels in a BigTable database by presenting the design of Graphulo, a library for executing graph algorithms inside the Apache Accumulo database. We detail the Graphulo implementation of two graph algorithms and conduct experiments comparing their performance to two main-memory matrix math systems. Our results shed insight into the conditions that determine when executing a graph algorithm is faster inside a database versus an external system—in short, that memory requirements and relative I/O are critical factors.
READ LESS

Summary

Google BigTable's scale-out design for distributed key-value storage inspired a generation of NoSQL databases. Recently the NewSQL paradigm emerged in response to analytic workloads that demand distributed computation local to data storage. Many such analytics take the form of graph algorithms, a trend that motivated the GraphBLAS initiative to standardize...

READ MORE

Julia implementation of the Dynamic Distributed Dimensional Data Model

Published in:
HPEC 2016: IEEE Conf. on High Performance Extreme Computing, 13-15 September 2016.

Summary

Julia is a new language for writing data analysis programs that are easy to implement and run at high performance. Similarly, the Dynamic Distributed Dimensional Data Model (D4M) aims to clarify data analysis operations while retaining strong performance. D4M accomplishes these goals through a composable, unified data model on associative arrays. In this work, we present an implementation of D4M in Julia and describe how it enables and facilitates data analysis. Several experiments showcase scalable performance in our new Julia version as compared to the original Matlab implementation.
READ LESS

Summary

Julia is a new language for writing data analysis programs that are easy to implement and run at high performance. Similarly, the Dynamic Distributed Dimensional Data Model (D4M) aims to clarify data analysis operations while retaining strong performance. D4M accomplishes these goals through a composable, unified data model on associative...

READ MORE

Designing a new high performance computing education strategy for professional scientists and engineers

Summary

For decades the High Performance Computing (HPC) community has used web content, workshops and embedded HPC scientists to enable practitioners to harness the power of parallel and distributed computing. The most successful approaches, face-to-face tutorials and embedded professionals, don't scale. To create scalable, flexible, educational experiences for practitioners in all phases of a career, from student to professional, we turn to Massively Open Online Courses (MOOCs). We detail the conversion of personalized tutorials to a selfpaced online course. In this demonstration, we highlight a course that mimics in-person tutorials by providing personalized paths through content that interleaves theory and practice, to help researchers learn key parallel computing concepts while developing familiarity with their HPC target system.
READ LESS

Summary

For decades the High Performance Computing (HPC) community has used web content, workshops and embedded HPC scientists to enable practitioners to harness the power of parallel and distributed computing. The most successful approaches, face-to-face tutorials and embedded professionals, don't scale. To create scalable, flexible, educational experiences for practitioners in all...

READ MORE

Enforced sparse non-negative matrix factorization

Published in:
30th IEEE Int. Parallel and Distributed Processing Symp., IPDPS 2016, 23-27 May 2016.

Summary

Non-negative matrix factorization (NMF) is a dimensionality reduction algorithm for data that can be represented as an undirected bipartite graph. It has become a common method for generating topic models of text data because it is known to produce good results, despite its relative simplicity of implementation and ease of computation. One challenge with applying the NMF to large datasets is that intermediate matrix products often become dense, thus stressing the memory and compute elements of the underlying system. In this article, we investigate a simple but powerful modification of the alternating least squares method of determining the NMF of a sparse matrix that enforces the generation of sparse intermediate and output matrices. This method enables the application of NMF to large datasets through improved memory and compute performance. Further, we demonstrate, empirically, that this method of enforcing sparsity in the NMF either preserves or improves both the accuracy of the resulting topic model and the convergence rate of the underlying algorithm.
READ LESS

Summary

Non-negative matrix factorization (NMF) is a dimensionality reduction algorithm for data that can be represented as an undirected bipartite graph. It has become a common method for generating topic models of text data because it is known to produce good results, despite its relative simplicity of implementation and ease of...

READ MORE

Storage and Database Management for Big Data

Published in:
Big Data: Storage, Sharing, and Security

Summary

The ability to collect and analyze large amounts of data is a growing problem within the scientific community. The growing gap between data and user calls for innovative tools that address the challenges faced by big data volume, velocity, and verity. While there has been great progress in the world of database technologies in the past few years, there are still many fundamental considerations that must be made by scientists. For example, which of the seemingly infinite technologies are the best to use for my problem? Answers to such questions require careful understanding of the technology field in addition to the types of problems that are being solved. This chapter aims to address many of the pressing questions faced by individuals interesting in using sotrage or database technologies to solve their big data problems.
READ LESS

Summary

The ability to collect and analyze large amounts of data is a growing problem within the scientific community. The growing gap between data and user calls for innovative tools that address the challenges faced by big data volume, velocity, and verity. While there has been great progress in the world...

READ MORE

D4M and large array databases for management and analysis of large biomedical imaging data

Summary

Advances in medical imaging technologies have enabled the acquisition of increasingly large datasets. Current state-of-the-art confocal or multi-photon imaging technology can produce biomedical datasets in excess of 1 TB per dataset. Typical approaches for analyzing large datasets rely on downsampling the original datasets or leveraging distributed computing resources where small subsets of images are processed independently. These approaches require significant overhead on the part of the programmer to load the desired sub-volume from an array of image files into memory. Databases are well suited for indexing and retrieving components of very large datasets and show significant promise for the analysis of 3D volumetric images. In particular, array-based databases such as SciDB utilize an architecture that supports massive parallel processing while also providing database services such as data management and fast parallel queries. In this paper, we will present a new set of tools that leverage the D4M (Dynamic Distributed Dimensional Data Model) toolbox for analyzing giga-voxel biomedical datasets. By combining SciDB and the D4M toolbox, we demonstrate that we can access large volumetric data and perform large-scale bioinformatics analytics efficiently and interactively. We show that it is possible to achieve an ingest rate of 2.8 million entries per second for importing large datasets into SciDB. These tools provide more efficient ways to access random sub-volumes of massive datasets and to process the information that typically cannot be loaded into memory. This work describes the D4M and SciDB tools that we developed and presents the initial performance results.
READ LESS

Summary

Advances in medical imaging technologies have enabled the acquisition of increasingly large datasets. Current state-of-the-art confocal or multi-photon imaging technology can produce biomedical datasets in excess of 1 TB per dataset. Typical approaches for analyzing large datasets rely on downsampling the original datasets or leveraging distributed computing resources where small...

READ MORE

Recommender systems for the Department of Defense and intelligence community

Summary

Recommender systems, which selectively filter information for users, can hasten analysts' responses to complex events such as cyber attacks. Lincoln Laboratory's research on recommender systems may bring the capabilities of these systems to analysts in both the Department of Defense and intelligence community.
READ LESS

Summary

Recommender systems, which selectively filter information for users, can hasten analysts' responses to complex events such as cyber attacks. Lincoln Laboratory's research on recommender systems may bring the capabilities of these systems to analysts in both the Department of Defense and intelligence community.

READ MORE

Secure and resilient cloud computing for the Department of Defense

Summary

Cloud computing offers substantial benefits to its users: the ability to store and access massive amounts of data, on-demand delivery of computing services, the capability to widely share information, and the scalability of resource usage. Lincoln Laboratory is developing technology that will strengthen the security and resilience of cloud computing so that the Department of Defense can confidently deploy cloud services for its critical missions.
READ LESS

Summary

Cloud computing offers substantial benefits to its users: the ability to store and access massive amounts of data, on-demand delivery of computing services, the capability to widely share information, and the scalability of resource usage. Lincoln Laboratory is developing technology that will strengthen the security and resilience of cloud computing...

READ MORE