Publications

Refine Results

(Filters Applied) Clear All

On the challenges of effective movement

Published in:
ACM Workshop on Moving Target Defense (MTD 2014), 3 November 2014.

Summary

Moving Target (MT) defenses have been proposed as a gamechanging approach to rebalance the security landscape in favor of the defender. MT techniques make systems less deterministic, less static, and less homogeneous in order to increase the level of effort required to achieve a successful compromise. However, a number of challenges in achieving effective movement lead to weaknesses in MT techniques that can often be used by the attackers to bypass or otherwise nullify the impact of that movement. In this paper, we propose that these challenges can be grouped into three main types: coverage, unpredictability, and timeliness. We provide a description of these challenges and study how they impact prominent MT techniques. We also discuss a number of other considerations faced when designing and deploying MT defenses.
READ LESS

Summary

Moving Target (MT) defenses have been proposed as a gamechanging approach to rebalance the security landscape in favor of the defender. MT techniques make systems less deterministic, less static, and less homogeneous in order to increase the level of effort required to achieve a successful compromise. However, a number of...

READ MORE

Information leaks without memory disclosures: remote side channel attacks on diversified code

Published in:
CCS 2014: Proc. of the ACM Conf. on Computer and Communications Security, 3-7 November 2014.

Summary

Code diversification has been proposed as a technique to mitigate code reuse attacks, which have recently become the predominant way for attackers to exploit memory corruption vulnerabilities. As code reuse attacks require detailed knowledge of where code is in memory, diversification techniques attempt to mitigate these attacks by randomizing what instructions are executed and where code is located in memory. As an attacker cannot read the diversified code, it is assumed he cannot reliably exploit the code. In this paper, we show that the fundamental assumption behind code diversity can be broken, as executing the code reveals information about the code. Thus, we can leak information without needing to read the code. We demonstrate how an attacker can utilize a memory corruption vulnerability to create side channels that leak information in novel ways, removing the need for a memory disclosure vulnerability. We introduce seven new classes of attacks that involve fault analysis and timing side channels, where each allows a remote attacker to learn how code has been diversified.
READ LESS

Summary

Code diversification has been proposed as a technique to mitigate code reuse attacks, which have recently become the predominant way for attackers to exploit memory corruption vulnerabilities. As code reuse attacks require detailed knowledge of where code is in memory, diversification techniques attempt to mitigate these attacks by randomizing what...

READ MORE

Spectral anomaly detection in very large graphs: Models, noise, and computational complexity(92.92 KB)

Published in:
Proceedings of Seminar 14461: High-performance Graph Algorithms and Applications in Computational Science, Wadern, Germany

Summary

Anomaly detection in massive networks has numerous theoretical and computational challenges, especially as the behavior to be detected becomes small in comparison to the larger network. This presentation focuses on recent results in three key technical areas, specifically geared toward spectral methods for detection.
READ LESS

Summary

Anomaly detection in massive networks has numerous theoretical and computational challenges, especially as the behavior to be detected becomes small in comparison to the larger network. This presentation focuses on recent results in three key technical areas, specifically geared toward spectral methods for detection.

READ MORE

Finding good enough: a task-based evaluation of query biased summarization for cross language information retrieval

Published in:
EMNLP 2014, Proc. of Conf. on Empirical Methods in Natural Language Processing, 25-29 October, 2014, pp. 657-69.

Summary

In this paper we present our task-based evaluation of query biased summarization for cross-language information retrieval (CLIR) using relevance prediction. We describe our 13 summarization methods each from one of four summarization strategies. We show how well our methods perform using Farsi text from the CLEF 2008 shared-task, which we translated to English automatically. We report precision/recall/F1, accuracy and time-on-task. We found that different summarization methods perform optimally for different evaluation metrics, but overall query biased word clouds are the best summarization strategy. In our analysis, we demonstrate that using the ROUGE metric on our sentence-based summaries cannot make the same kinds of distinctions as our evaluation framework does. Finally, we present our recommendations for creating much-needed evaluation standards and databases.
READ LESS

Summary

In this paper we present our task-based evaluation of query biased summarization for cross-language information retrieval (CLIR) using relevance prediction. We describe our 13 summarization methods each from one of four summarization strategies. We show how well our methods perform using Farsi text from the CLEF 2008 shared-task, which we...

READ MORE

Quantitative evaluation of dynamic platform techniques as a defensive mechanism

Published in:
RAID 2014: 17th Int. Symp. on Research in Attacks, Intrusions, and Defenses, 17-19 September 2014.

Summary

Cyber defenses based on dynamic platform techniques have been proposed as a way to make systems more resilient to attacks. These defenses change the properties of the platforms in order to make attacks more complicated. Unfortunately, little work has been done on measuring the effectiveness of these defenses. In this work, we first measure the protection provided by a dynamic platform technique on a testbed. The counter-intuitive results obtained from the testbed guide us in identifying and quantifying the major effects contributing to the protection in such a system. Based on the abstract effects, we develop a generalized model of dynamic platform techniques which can be used to quantify their effectiveness. To verify and validate out results, we simulate the generalized model and show that the testbed measurements and the simulations match with small amount of error. Finally, we enumerate a number of lessons learned in our work which can be applied to quantitative evaluation of other defensive techniques.
READ LESS

Summary

Cyber defenses based on dynamic platform techniques have been proposed as a way to make systems more resilient to attacks. These defenses change the properties of the platforms in order to make attacks more complicated. Unfortunately, little work has been done on measuring the effectiveness of these defenses. In this...

READ MORE

Using deep belief networks for vector-based speaker recognition

Published in:
INTERSPEECH 2014: 15th Annual Conf. of the Int. Speech Communication Assoc., 14-18 September 2014.

Summary

Deep belief networks (DBNs) have become a successful approach for acoustic modeling in speech recognition. DBNs exhibit strong approximation properties, improved performance, and are parameter efficient. In this work, we propose methods for applying DBNs to speaker recognition. In contrast to prior work, our approach to DBNs for speaker recognition starts at the acoustic modeling layer. We use sparse-output DBNs trained with both unsupervised and supervised methods to generate statistics for use in standard vector-based speaker recognition methods. We show that a DBN can replace a GMM UBM in this processing. Methods, qualitative analysis, and results are given on a NIST SRE 2012 task. Overall, our results show that DBNs show competitive performance to modern approaches in an initial implementation of our framework.
READ LESS

Summary

Deep belief networks (DBNs) have become a successful approach for acoustic modeling in speech recognition. DBNs exhibit strong approximation properties, improved performance, and are parameter efficient. In this work, we propose methods for applying DBNs to speaker recognition. In contrast to prior work, our approach to DBNs for speaker recognition...

READ MORE

Talking Head Detection by Likelihood-Ratio Test(220.2 KB)

Published in:
Second Workshop on Speech, Language, Audio in Multimedia

Summary

Detecting accurately when a person whose face is visible in an audio-visual medium is the audible speaker is an enabling technology with a number of useful applications. The likelihood-ratio test formulation and feature signal processing employed here allow the use of high-dimensional feature sets in the audio and visual domain, and the approach appears to have good detection performance for AV segments as short as a few seconds.
READ LESS

Summary

Detecting accurately when a person whose face is visible in an audio-visual medium is the audible speaker is an enabling technology with a number of useful applications. The likelihood-ratio test formulation and feature signal processing employed here allow the use of high-dimensional feature sets in the audio and visual domain...

READ MORE

Computing on masked data: a high performance method for improving big data veracity

Published in:
HPEC 2014: IEEE Conf. on High Performance Extreme Computing, 9-11 September 2014.

Summary

The growing gap between data and users calls for innovative tools that address the challenges faced by big data volume, velocity and variety. Along with these standard three V's of big data, an emerging fourth "V" is veracity, which addresses the confidentiality, integrity, and availability of the data. Traditional cryptographic techniques that ensure the veracity of data can have overheads that are too large to apply to big data. This work introduces a new technique called Computing on Masked Data (CMD), which improves data veracity by allowing computations to be performed directly on masked data and ensuring that only authorized recipients can unmask the data. Using the sparse linear algebra of associative arrays, CMD can be performed with significantly less overhead than other approaches while still supporting a wide range of linear algebraic operations on the masked data. Databases with strong support of sparse operations, such as SciDB or Apache Accumulo, are ideally suited to this technique. Examples are shown for the application of CMD to a complex DNA matching algorithm and to database operations over social media data.
READ LESS

Summary

The growing gap between data and users calls for innovative tools that address the challenges faced by big data volume, velocity and variety. Along with these standard three V's of big data, an emerging fourth "V" is veracity, which addresses the confidentiality, integrity, and availability of the data. Traditional cryptographic...

READ MORE

A test-suite generator for database systems

Published in:
HPEC 2014: IEEE Conf. on High Performance Extreme Computing, 9-11 September 2014.

Summary

In this paper, we describe the SPAR Test Suite Generator (STSG), a new test-suite generator for SQL style database systems. This tool produced an entire test suite (data, queries, and ground-truth answers) as a unit and in response to a user's specification. Thus, database evaluators could use this tool to craft test suites for particular aspects of a specific database system. The inclusion of ground-truth answers in the produced test suite, furthermore, allowed this tool to support both benchmarking (at various scales) and correctness-checking in a repeatable way. Lastly, the test-suite generator of this document was extensively profiled and optimized, and was designed for test-time agility.
READ LESS

Summary

In this paper, we describe the SPAR Test Suite Generator (STSG), a new test-suite generator for SQL style database systems. This tool produced an entire test suite (data, queries, and ground-truth answers) as a unit and in response to a user's specification. Thus, database evaluators could use this tool to...

READ MORE

Achieving 100,000,000 database inserts per second using Accumulo and D4M

Summary

The Apache Accumulo database is an open source relaxed consistency database that is widely used for government applications. Accumulo is designed to deliver high performance on unstructured data such as graphs of network data. This paper tests the performance of Accumulo using data from the Graph500 benchmark. The Dynamic Distributed Dimensional Data Model (D4M) software is used to implement the benchmark on a 216-node cluster running the MIT SuperCloud software stack. A peak performance of over 100,000,000 database inserts per second was achieved which is 100x larger than the highest previously published value for any other database. The performance scales linearly with the number of ingest clients, number of database servers, and data size. The performance was achieved by adapting several supercomputing techniques to this application: distributed arrays, domain decomposition, adaptive load balancing, and single-program-multiple-data programming.
READ LESS

Summary

The Apache Accumulo database is an open source relaxed consistency database that is widely used for government applications. Accumulo is designed to deliver high performance on unstructured data such as graphs of network data. This paper tests the performance of Accumulo using data from the Graph500 benchmark. The Dynamic Distributed...

READ MORE