Publications

Refine Results

(Filters Applied) Clear All

Curator: provenance management for modern distributed systems

Published in:
10th Intl. Workshop on Theory and Practice of Provenance, TaPP, 11-12 July 2018.

Summary

Data provenance is a valuable tool for protecting and troubleshooting distributed systems. Careful design of the provenance components reduces the impact on the design, implementation, and operation of the distributed system. In this paper, we present Curator, a provenance management toolkit that can be easily integrated with microservice-based systems and other modern distributed systems. This paper describes the design of Curator and discusses how we have used Curator to add provenance to distributed systems. We find that our approach results in no changes to the design of these distributed systems and minimal additional code and dependencies to manage. In addition, Curator uses the same scalable infrastructure as the distributed system and can therefore scale with the distributed system.
READ LESS

Summary

Data provenance is a valuable tool for protecting and troubleshooting distributed systems. Careful design of the provenance components reduces the impact on the design, implementation, and operation of the distributed system. In this paper, we present Curator, a provenance management toolkit that can be easily integrated with microservice-based systems and...

READ MORE

Automated provenance analytics: a regular grammar based approach with applications in security

Published in:
9th Intl. Workshop on Theory and Practice of Provenance, TaPP, 22-23 June 2017.

Summary

Provenance collection techniques have been carefully studied in the literature, and there are now several systems to automatically capture provenance data. However, the analysis of provenance data is often left "as an exercise for the reader". The provenance community needs tools that allow users to quickly sort through large volumes of provenance data and identify records that require further investigation. By detecting anomalies in provenance data that deviate from established patterns, we hope to actively thwart security threats. In this paper, we discuss issues with current graph analysis techniques as applied to data provenance, particularly Frequent Subgraph Mining (FSM). Then we introduce Directed Acyclic Graph regular grammars (DAGr) as a model for provenance data and show how they can detect anomalies. These DAGr provide an expressive characterization of DAGs, and by using regular grammars as a formalism, we can apply results from formal language theory to learn the difference between "good" and "bad" provenance. We propose a restricted subclass of DAGr called deterministic Directed Acyclic Graph automata (dDAGa) that guarantees parsing in linear time. Finally, we propose a learning algorithm for dDAGa, inspired by Minimum Description Length for Grammar Induction.
READ LESS

Summary

Provenance collection techniques have been carefully studied in the literature, and there are now several systems to automatically capture provenance data. However, the analysis of provenance data is often left "as an exercise for the reader". The provenance community needs tools that allow users to quickly sort through large volumes...

READ MORE

Bootstrapping and maintaining trust in the cloud

Published in:
32nd Annual Computer Security Applications Conf., ACSAC 2016, 5-9 December 2016.

Summary

Today's infrastructure as a service (IaaS) cloud environments rely upon full trust in the provider to secure applications and data. Cloud providers do not offer the ability to create hardware-rooted cryptographic identities for IaaS cloud resources or sufficient information to verify the integrity of systems. Trusted computing protocols and hardware like the TPM have long promised a solution to this problem. However, these technologies have not seen broad adoption because of their complexity of implementation, low performance, and lack of compatibility with virtualized environments. In this paper we introduce keylime, a scalable trusted cloud key management system. keylime provides an end-to-end solution for both bootstrapping hardware rooted cryptographic identities for IaaS nodes and for system integrity monitoring of those nodes via periodic attestation. We support these functions in both bare-metal and virtualized IaaS environments using a virtual TPM. keylime provides a clean interface that allows higher level security services like disk encryption or configuration management to leverage trusted computing without being trusted computing aware. We show that our bootstrapping protocol can derive a key in less than two seconds, we can detect system integrity violations in as little as 110ms, and that keylime can scale to thousands of IaaS cloud nodes.
READ LESS

Summary

Today's infrastructure as a service (IaaS) cloud environments rely upon full trust in the provider to secure applications and data. Cloud providers do not offer the ability to create hardware-rooted cryptographic identities for IaaS cloud resources or sufficient information to verify the integrity of systems. Trusted computing protocols and hardware...

READ MORE

Leveraging data provenance to enhance cyber resilience

Summary

Building secure systems used to mean ensuring a secure perimeter, but that is no longer the case. Today's systems are ill-equipped to deal with attackers that are able to pierce perimeter defenses. Data provenance is a critical technology in building resilient systems that will allow systems to recover from attackers that manage to overcome the "hard-shell" defenses. In this paper, we provide background information on data provenance, details on provenance collection, analysis, and storage techniques and challenges. Data provenance is situated to address the challenging problem of allowing a system to "fight-through" an attack, and we help to identify necessary work to ensure that future systems are resilient.
READ LESS

Summary

Building secure systems used to mean ensuring a secure perimeter, but that is no longer the case. Today's systems are ill-equipped to deal with attackers that are able to pierce perimeter defenses. Data provenance is a critical technology in building resilient systems that will allow systems to recover from attackers...

READ MORE

High-throughput ingest of data provenance records in Accumulo

Published in:
HPEC 2016: IEEE Conf. on High Performance Extreme Computing, 13-15 September 2016.

Summary

Whole-system data provenance provides deep insight into the processing of data on a system, including detecting data integrity attacks. The downside to systems that collect whole-system data provenance is the sheer volume of data that is generated under many heavy workloads. In order to make provenance metadata useful, it must be stored somewhere where it can be queried. This problem becomes even more challenging when considering a network of provenance-aware machines all collecting this metadata. In this paper, we investigate the use of D4M and Accumulo to support high-throughput data ingest of whole-system provenance data. We find that we are able to ingest 3,970 graph components per second. Centrally storing the provenance metadata allows us to build systems that can detect and respond to data integrity attacks that are captured by the provenance system.
READ LESS

Summary

Whole-system data provenance provides deep insight into the processing of data on a system, including detecting data integrity attacks. The downside to systems that collect whole-system data provenance is the sheer volume of data that is generated under many heavy workloads. In order to make provenance metadata useful, it must...

READ MORE

High-throughput ingest of data provenance records in Accumulo

Published in:
HPEC 2016: IEEE Conf. on High Performance Extreme Computing, 13-15 September 2016.

Summary

Whole-system data provenance provides deep insight into the processing of data on a system, including detecting data integrity attacks. The downside to systems that collect whole-system data provenance is the sheer volume of data that is generated under many heavy workloads. In order to make provenance metadata useful, it must be stored somewhere where it can be queried. This problem becomes even more challenging when considering a network of provenance-aware machines all collecting this metadata. In this paper, we investigate the use of D4M and Accumulo to support high-throughput data ingest of whole-system provenance data. We find that we are able to ingest 3,970 graph components per second. Centrally storing the provenance metadata allows us to build systems that can detect and respond to data integrity attacks that are captured by the provenance system.
READ LESS

Summary

Whole-system data provenance provides deep insight into the processing of data on a system, including detecting data integrity attacks. The downside to systems that collect whole-system data provenance is the sheer volume of data that is generated under many heavy workloads. In order to make provenance metadata useful, it must...

READ MORE

Secure and resilient cloud computing for the Department of Defense

Summary

Cloud computing offers substantial benefits to its users: the ability to store and access massive amounts of data, on-demand delivery of computing services, the capability to widely share information, and the scalability of resource usage. Lincoln Laboratory is developing technology that will strengthen the security and resilience of cloud computing so that the Department of Defense can confidently deploy cloud services for its critical missions.
READ LESS

Summary

Cloud computing offers substantial benefits to its users: the ability to store and access massive amounts of data, on-demand delivery of computing services, the capability to widely share information, and the scalability of resource usage. Lincoln Laboratory is developing technology that will strengthen the security and resilience of cloud computing...

READ MORE

Runtime integrity measurement and enforcement with automated whitelist generation

Published in:
2014 Annual Computer Security Applications Conf., ACSAC, 8-12 December 2014.

Summary

This poster discusses a strategy for automatic whitelist generation and enforcement using techniques from information flow control and trusted computing. During a measurement phase, a cloud provider uses dynamic taint tracking to generate a whitelist of executed code and associated file hashes generated by an integrity measurement system. Then, at runtime, it can again use dynamic taint tracking to enforce execution only of code from files whose names and integrity measurement hashes exactly match the whitelist, preventing adversaries from exploiting buffer overflows or running their own code on the system. This provides the capability for runtime integrity enforcement or attestation. Our prototype system, built on top of Intel's PIN emulation environment and the libdft taint tracking system, demonstrates high accuracy in tracking the sources of instructions.
READ LESS

Summary

This poster discusses a strategy for automatic whitelist generation and enforcement using techniques from information flow control and trusted computing. During a measurement phase, a cloud provider uses dynamic taint tracking to generate a whitelist of executed code and associated file hashes generated by an integrity measurement system. Then, at...

READ MORE

Showing Results

1-8 of 8