Publications
Tagged As
Temporal and multi-source fusion for detection of innovation in collaboration networks
Summary
Summary
A common problem in network analysis is detecting small subgraphs of interest within a large background graph. This includes multi-source fusion scenarios where data from several modalities must be integrated to form the network. This paper presents an application of novel techniques leveraging the signal processing for graphs algorithmic framework...
Global pattern search at scale
Summary
Summary
In recent years, data collection has far outpaced the tools for data analysis in the area of non-traditional GEOINT analysis. Traditional tools are designed to analyze small-scale numerical data, but there are few good interactive tools for processing large amounts of unstructured data such as raw text. In addition to...
Rapid sequence identification of potential pathogens using techniques from sparse linear algebra
Summary
Summary
The decreasing costs and increasing speed and accuracy of DNA sample collection, preparation, and sequencing has rapidly produced an enormous volume of genetic data. However, fast and accurate analysis of the samples remains a bottleneck. Here we present D4RAGenS, a genetic sequence identification algorithm that exhibits the Big Data handling...
Computing on Masked Data to improve the security of big data
Summary
Summary
Organizations that make use of large quantities of information require the ability to store and process data from central locations so that the product can be shared or distributed across a heterogeneous group of users. However, recent events underscore the need for improving the security of data stored in such...
Using a big data database to identify pathogens in protein data space [e-print]
Summary
Summary
Current metagenomic analysis algorithms require significant computing resources, can report excessive false positives (type I errors), may miss organisms (type II errors/false negatives), or scale poorly on large datasets. This paper explores using big data database technologies to characterize very large metagenomic DNA sequences in protein space, with the ultimate...
NEU_MITLL @ TRECVid 2015: multimedia event detection by pre-trained CNN models
Summary
Summary
We introduce a framework for multimedia event detection (MED), which was developed for TRECVID 2015 using convolutional neural networks (CNNs) to detect complex events via deterministic models trained on video frame data. We used several well-known CNN models designed to detect objects, scenes, and a combination of both (i.e., Hybrid-CNN)...
Spectral anomaly detection in very large graphs: Models, noise, and computational complexity(92.92 KB)
Summary
Summary
Anomaly detection in massive networks has numerous theoretical and computational challenges, especially as the behavior to be detected becomes small in comparison to the larger network. This presentation focuses on recent results in three key technical areas, specifically geared toward spectral methods for detection.
Achieving 100,000,000 database inserts per second using Accumulo and D4M
Summary
Summary
The Apache Accumulo database is an open source relaxed consistency database that is widely used for government applications. Accumulo is designed to deliver high performance on unstructured data such as graphs of network data. This paper tests the performance of Accumulo using data from the Graph500 benchmark. The Dynamic Distributed...
Genetic sequence matching using D4M big data approaches
Summary
Summary
Recent technological advances in Next Generation Sequencing tools have led to increasing speeds of DNA sample collection, preparation, and sequencing. One instrument can produce over 600 Gb of genetic sequence data in a single run. This creates new opportunities to efficiently handle the increasing workload. We propose a new method...
Big Data dimensional analysis
Summary
Summary
The ability to collect and analyze large amounts of data is a growing problem within the scientific community. The growing gap between data and users calls for innovative tools that address the challenges faced by big data volume, velocity and variety. One of the main challenges associated with big data...