Publications

Refine Results

(Filters Applied) Clear All

Rapid sequence identification of potential pathogens using techniques from sparse linear algebra

Summary

The decreasing costs and increasing speed and accuracy of DNA sample collection, preparation, and sequencing has rapidly produced an enormous volume of genetic data. However, fast and accurate analysis of the samples remains a bottleneck. Here we present D4RAGenS, a genetic sequence identification algorithm that exhibits the Big Data handling and computational power of the Dynamic Distributed Dimensional Data Model (D4M). The method leverages linear algebra and statistical properties to increase computational performance while retaining accuracy by subsampling the data. Two run modes, Fast and Wise, yield speed and precision tradeoffs, with applications in biodefense and medical diagnostics. The D4RAGenS analysis algorithm is tested over several datasets, including three utilized for the Defense Threat Reduction Agency (DTRA) metagenomic algorithm contest.
READ LESS

Summary

The decreasing costs and increasing speed and accuracy of DNA sample collection, preparation, and sequencing has rapidly produced an enormous volume of genetic data. However, fast and accurate analysis of the samples remains a bottleneck. Here we present D4RAGenS, a genetic sequence identification algorithm that exhibits the Big Data handling...

READ MORE

Cryptographically secure computation

Published in:
Computer, Vol. 48, No. 4, April 2015, pp. 78-81.

Summary

Researchers are making secure multiparty computation--a cryptographic technique that enables information sharing and analysis while keeping sensitive inputs secret--faster and easier to use for application software developers.
READ LESS

Summary

Researchers are making secure multiparty computation--a cryptographic technique that enables information sharing and analysis while keeping sensitive inputs secret--faster and easier to use for application software developers.

READ MORE

HEtest: a homomorphic encryption testing framework

Published in:
3rd Workshop on Encrypted Computing and Applied Homomorphic Cryptography (WAHC 2015), 30 January 2015.

Summary

In this work, we present a generic open-source software framework that can evaluate the correctness and performance of homomorphic encryption software. Our framework, called HEtest, automates the entire process of a test: generation of data for testing (such as circuits and inputs), execution of a test, comparison of performance to an insecure baseline, statistical analysis of the test results, and production of a LaTeX report. To illustrate the capability of our framework, we present a case study of our analysis of the open-source HElib homomorphic encryption software. We stress though that HEtest is written in a modular fashion, so it can easily be adapted to test any homomorphic encryption software.
READ LESS

Summary

In this work, we present a generic open-source software framework that can evaluate the correctness and performance of homomorphic encryption software. Our framework, called HEtest, automates the entire process of a test: generation of data for testing (such as circuits and inputs), execution of a test, comparison of performance to...

READ MORE

Using a big data database to identify pathogens in protein data space [e-print]

Summary

Current metagenomic analysis algorithms require significant computing resources, can report excessive false positives (type I errors), may miss organisms (type II errors/false negatives), or scale poorly on large datasets. This paper explores using big data database technologies to characterize very large metagenomic DNA sequences in protein space, with the ultimate goal of rapid pathogen identification in patient samples. Our approach uses the abilities of a big data databases to hold large sparse associative array representations of genetic data to extract statistical patterns about the data that can be used in a variety of ways to improve identification algorithms.
READ LESS

Summary

Current metagenomic analysis algorithms require significant computing resources, can report excessive false positives (type I errors), may miss organisms (type II errors/false negatives), or scale poorly on large datasets. This paper explores using big data database technologies to characterize very large metagenomic DNA sequences in protein space, with the ultimate...

READ MORE

Automated assessment of secure search systems

Summary

This work presents the results of a three-year project that assessed nine different privacy-preserving data search systems. We detail the design of a software assessment framework that focuses on low system footprint, repeatability, and reusability. A unique achievement of this project was the automation and integration of the entire test process, from the production and execution of tests to the generation of human-readable evaluation reports. We synthesize our experiences into a set of simple mantras that we recommend following in the design of any assessment framework.
READ LESS

Summary

This work presents the results of a three-year project that assessed nine different privacy-preserving data search systems. We detail the design of a software assessment framework that focuses on low system footprint, repeatability, and reusability. A unique achievement of this project was the automation and integration of the entire test...

READ MORE

Runtime integrity measurement and enforcement with automated whitelist generation

Published in:
2014 Annual Computer Security Applications Conf., ACSAC, 8-12 December 2014.

Summary

This poster discusses a strategy for automatic whitelist generation and enforcement using techniques from information flow control and trusted computing. During a measurement phase, a cloud provider uses dynamic taint tracking to generate a whitelist of executed code and associated file hashes generated by an integrity measurement system. Then, at runtime, it can again use dynamic taint tracking to enforce execution only of code from files whose names and integrity measurement hashes exactly match the whitelist, preventing adversaries from exploiting buffer overflows or running their own code on the system. This provides the capability for runtime integrity enforcement or attestation. Our prototype system, built on top of Intel's PIN emulation environment and the libdft taint tracking system, demonstrates high accuracy in tracking the sources of instructions.
READ LESS

Summary

This poster discusses a strategy for automatic whitelist generation and enforcement using techniques from information flow control and trusted computing. During a measurement phase, a cloud provider uses dynamic taint tracking to generate a whitelist of executed code and associated file hashes generated by an integrity measurement system. Then, at...

READ MORE

On the challenges of effective movement

Published in:
ACM Workshop on Moving Target Defense (MTD 2014), 3 November 2014.

Summary

Moving Target (MT) defenses have been proposed as a gamechanging approach to rebalance the security landscape in favor of the defender. MT techniques make systems less deterministic, less static, and less homogeneous in order to increase the level of effort required to achieve a successful compromise. However, a number of challenges in achieving effective movement lead to weaknesses in MT techniques that can often be used by the attackers to bypass or otherwise nullify the impact of that movement. In this paper, we propose that these challenges can be grouped into three main types: coverage, unpredictability, and timeliness. We provide a description of these challenges and study how they impact prominent MT techniques. We also discuss a number of other considerations faced when designing and deploying MT defenses.
READ LESS

Summary

Moving Target (MT) defenses have been proposed as a gamechanging approach to rebalance the security landscape in favor of the defender. MT techniques make systems less deterministic, less static, and less homogeneous in order to increase the level of effort required to achieve a successful compromise. However, a number of...

READ MORE

Information leaks without memory disclosures: remote side channel attacks on diversified code

Published in:
CCS 2014: Proc. of the ACM Conf. on Computer and Communications Security, 3-7 November 2014.

Summary

Code diversification has been proposed as a technique to mitigate code reuse attacks, which have recently become the predominant way for attackers to exploit memory corruption vulnerabilities. As code reuse attacks require detailed knowledge of where code is in memory, diversification techniques attempt to mitigate these attacks by randomizing what instructions are executed and where code is located in memory. As an attacker cannot read the diversified code, it is assumed he cannot reliably exploit the code. In this paper, we show that the fundamental assumption behind code diversity can be broken, as executing the code reveals information about the code. Thus, we can leak information without needing to read the code. We demonstrate how an attacker can utilize a memory corruption vulnerability to create side channels that leak information in novel ways, removing the need for a memory disclosure vulnerability. We introduce seven new classes of attacks that involve fault analysis and timing side channels, where each allows a remote attacker to learn how code has been diversified.
READ LESS

Summary

Code diversification has been proposed as a technique to mitigate code reuse attacks, which have recently become the predominant way for attackers to exploit memory corruption vulnerabilities. As code reuse attacks require detailed knowledge of where code is in memory, diversification techniques attempt to mitigate these attacks by randomizing what...

READ MORE

Quantitative evaluation of dynamic platform techniques as a defensive mechanism

Published in:
RAID 2014: 17th Int. Symp. on Research in Attacks, Intrusions, and Defenses, 17-19 September 2014.

Summary

Cyber defenses based on dynamic platform techniques have been proposed as a way to make systems more resilient to attacks. These defenses change the properties of the platforms in order to make attacks more complicated. Unfortunately, little work has been done on measuring the effectiveness of these defenses. In this work, we first measure the protection provided by a dynamic platform technique on a testbed. The counter-intuitive results obtained from the testbed guide us in identifying and quantifying the major effects contributing to the protection in such a system. Based on the abstract effects, we develop a generalized model of dynamic platform techniques which can be used to quantify their effectiveness. To verify and validate out results, we simulate the generalized model and show that the testbed measurements and the simulations match with small amount of error. Finally, we enumerate a number of lessons learned in our work which can be applied to quantitative evaluation of other defensive techniques.
READ LESS

Summary

Cyber defenses based on dynamic platform techniques have been proposed as a way to make systems more resilient to attacks. These defenses change the properties of the platforms in order to make attacks more complicated. Unfortunately, little work has been done on measuring the effectiveness of these defenses. In this...

READ MORE

Computing on masked data: a high performance method for improving big data veracity

Published in:
HPEC 2014: IEEE Conf. on High Performance Extreme Computing, 9-11 September 2014.

Summary

The growing gap between data and users calls for innovative tools that address the challenges faced by big data volume, velocity and variety. Along with these standard three V's of big data, an emerging fourth "V" is veracity, which addresses the confidentiality, integrity, and availability of the data. Traditional cryptographic techniques that ensure the veracity of data can have overheads that are too large to apply to big data. This work introduces a new technique called Computing on Masked Data (CMD), which improves data veracity by allowing computations to be performed directly on masked data and ensuring that only authorized recipients can unmask the data. Using the sparse linear algebra of associative arrays, CMD can be performed with significantly less overhead than other approaches while still supporting a wide range of linear algebraic operations on the masked data. Databases with strong support of sparse operations, such as SciDB or Apache Accumulo, are ideally suited to this technique. Examples are shown for the application of CMD to a complex DNA matching algorithm and to database operations over social media data.
READ LESS

Summary

The growing gap between data and users calls for innovative tools that address the challenges faced by big data volume, velocity and variety. Along with these standard three V's of big data, an emerging fourth "V" is veracity, which addresses the confidentiality, integrity, and availability of the data. Traditional cryptographic...

READ MORE