Publications

Refine Results

(Filters Applied) Clear All

R&D Areas

R&D Groups

Year

Items per page

By

Luke R. Johnson Clear filter

Sampling operations on big data

November 8, 2015

Conference Paper

Author:

Vijay N. Gadepally

…

Published in:

2015 Asilomar Conf. on Signals, Systems and Computers, 8-11 November 2015.

Topic:

big data

R&D area:

Cyber Security and Information Sciences

R&D group:

Secure Resilient Systems and Technology

Summary

The 3Vs -- Volume, Velocity and Variety -- of Big Data continues to be a large challenge for systems and algorithms designed to store, process and disseminate information for discovery and exploration under real-time constraints. Common signal processing operations such as sampling and filtering, which have been used for decades to compress signals are often undefined in data that is characterized by heterogeneity, high dimensionality, and lack of known structure. In this article, we describe and demonstrate an approach to sample large datasets such as social media data. We evaluate the effect of sampling on a common predictive analytic: link prediction. Our results indicate that greatly sampling a dataset can still yield meaningful link prediction results.

READ LESS

Summary

Sampling operations on big data

Sampling large graphs for anticipatory analytics

September 15, 2015

Conference Paper

Author:

Lauren Milechin

…

Published in:

HPEC 2015: IEEE Conf. on High Performance Extreme Computing, 15-17 September 2015.

Topic:

big data

R&D area:

Cyber Security and Information Sciences

R&D group:

Secure Resilient Systems and Technology

Summary

The characteristics of Big Data - often dubbed the 3V's for volume, velocity, and variety - will continue to outpace the ability of computational systems to process, store, and transmit meaningful results. Traditional techniques for dealing with large datasets often include the purchase of larger systems, greater human-in-the-loop involvement, or more complex algorithms. We are investigating the use of sampling to mitigate these challenges, specifically sampling large graphs. Often, large datasets can be represented as graphs where data entries may be edges, and vertices may be attributes of the data. In particular, we present the results of sampling for the task of link prediction. Link prediction is a process to estimate the probability of a new edge forming between two vertices of a graph, and it has numerous application areas in understanding social or biological networks. In this paper we propose a series of techniques for the sampling of large datasets. In order to quantify the effect of these techniques, we present the quality of link prediction tasks on sampled graphs, and the time saved in calculating link prediction statistics on these sampled graphs.

READ LESS

Summary

Sampling large graphs for anticipatory analytics

Publications

Refine Results

By

Sampling operations on big data

Summary

Summary

Sampling large graphs for anticipatory analytics

Summary

Summary

Showing Results