Publications
Sampling large graphs for anticipatory analytics
Summary
Summary
The characteristics of Big Data - often dubbed the 3V's for volume, velocity, and variety - will continue to outpace the ability of computational systems to process, store, and transmit meaningful results. Traditional techniques for dealing with large datasets often include the purchase of larger systems, greater human-in-the-loop involvement, or...
Using a power law distribution to describe big data
Summary
Summary
The gap between data production and user ability to access, compute and produce meaningful results calls for tools that address the challenges associated with big data volume, velocity and variety. One of the key hurdles is the inability to methodically remove expected or uninteresting elements from large data sets. This...
Parallel vectorized algebraic AES in MATLAB for rapid prototyping of encrypted sensor processing algorithms and database analytics
Summary
Summary
The increasing use of networked sensor systems and networked databases has led to an increased interest in incorporating encryption directly into sensor algorithms and database analytics. MATLAB is the dominant tool for rapid prototyping of sensor algorithms and has extensive database analytics capabilities. The advent of high level and high...
Improving big data visual analytics with interactive virtual reality
Summary
Summary
For decades, the growth and volume of digital data collection has made it challenging to digest large volumes of information and extract underlying structure. Coined 'Big Data', massive amounts of information has quite often been gathered inconsistently (e.g from many sources, of various forms, at different rates, etc.). These factors...
Computing on Masked Data to improve the security of big data
Summary
Summary
Organizations that make use of large quantities of information require the ability to store and process data from central locations so that the product can be shared or distributed across a heterogeneous group of users. However, recent events underscore the need for improving the security of data stored in such...
Big Data dimensional analysis
Summary
Summary
The ability to collect and analyze large amounts of data is a growing problem within the scientific community. The growing gap between data and users calls for innovative tools that address the challenges faced by big data volume, velocity and variety. One of the main challenges associated with big data...
A survey of cryptographic approaches to securing big-data analytics in the cloud
Summary
Summary
The growing demand for cloud computing motivates the need to study the security of data received, stored, processed, and transmitted by a cloud. In this paper, we present a framework for such a study. We introduce a cloud computing model that captures a rich class of big-data use-cases and allows...
Computing on masked data: a high performance method for improving big data veracity
Summary
Summary
The growing gap between data and users calls for innovative tools that address the challenges faced by big data volume, velocity and variety. Along with these standard three V's of big data, an emerging fourth "V" is veracity, which addresses the confidentiality, integrity, and availability of the data. Traditional cryptographic...
Computing on masked data: a high performance method for improving big data veracity
Summary
Summary
The growing gap between data and users calls for innovative tools that address the challenges faced by big data volume, velocity and variety. Along with these standard three V's of big data, an emerging fourth "V" is veracity, which addresses the confidentiality, integrity, and availability of the data. Traditional cryptographic...
Achieving 100,000,000 database inserts per second using Accumulo and D4M
Summary
Summary
The Apache Accumulo database is an open source relaxed consistency database that is widely used for government applications. Accumulo is designed to deliver high performance on unstructured data such as graphs of network data. This paper tests the performance of Accumulo using data from the Graph500 benchmark. The Dynamic Distributed...