Publications

Refine Results

(Filters Applied) Clear All

Runtime integrity measurement and enforcement with automated whitelist generation

Published in:
2014 Annual Computer Security Applications Conf., ACSAC, 8-12 December 2014.

Summary

This poster discusses a strategy for automatic whitelist generation and enforcement using techniques from information flow control and trusted computing. During a measurement phase, a cloud provider uses dynamic taint tracking to generate a whitelist of executed code and associated file hashes generated by an integrity measurement system. Then, at runtime, it can again use dynamic taint tracking to enforce execution only of code from files whose names and integrity measurement hashes exactly match the whitelist, preventing adversaries from exploiting buffer overflows or running their own code on the system. This provides the capability for runtime integrity enforcement or attestation. Our prototype system, built on top of Intel's PIN emulation environment and the libdft taint tracking system, demonstrates high accuracy in tracking the sources of instructions.
READ LESS

Summary

This poster discusses a strategy for automatic whitelist generation and enforcement using techniques from information flow control and trusted computing. During a measurement phase, a cloud provider uses dynamic taint tracking to generate a whitelist of executed code and associated file hashes generated by an integrity measurement system. Then, at...

READ MORE

On the challenges of effective movement

Published in:
ACM Workshop on Moving Target Defense (MTD 2014), 3 November 2014.

Summary

Moving Target (MT) defenses have been proposed as a gamechanging approach to rebalance the security landscape in favor of the defender. MT techniques make systems less deterministic, less static, and less homogeneous in order to increase the level of effort required to achieve a successful compromise. However, a number of challenges in achieving effective movement lead to weaknesses in MT techniques that can often be used by the attackers to bypass or otherwise nullify the impact of that movement. In this paper, we propose that these challenges can be grouped into three main types: coverage, unpredictability, and timeliness. We provide a description of these challenges and study how they impact prominent MT techniques. We also discuss a number of other considerations faced when designing and deploying MT defenses.
READ LESS

Summary

Moving Target (MT) defenses have been proposed as a gamechanging approach to rebalance the security landscape in favor of the defender. MT techniques make systems less deterministic, less static, and less homogeneous in order to increase the level of effort required to achieve a successful compromise. However, a number of...

READ MORE

Information leaks without memory disclosures: remote side channel attacks on diversified code

Published in:
CCS 2014: Proc. of the ACM Conf. on Computer and Communications Security, 3-7 November 2014.

Summary

Code diversification has been proposed as a technique to mitigate code reuse attacks, which have recently become the predominant way for attackers to exploit memory corruption vulnerabilities. As code reuse attacks require detailed knowledge of where code is in memory, diversification techniques attempt to mitigate these attacks by randomizing what instructions are executed and where code is located in memory. As an attacker cannot read the diversified code, it is assumed he cannot reliably exploit the code. In this paper, we show that the fundamental assumption behind code diversity can be broken, as executing the code reveals information about the code. Thus, we can leak information without needing to read the code. We demonstrate how an attacker can utilize a memory corruption vulnerability to create side channels that leak information in novel ways, removing the need for a memory disclosure vulnerability. We introduce seven new classes of attacks that involve fault analysis and timing side channels, where each allows a remote attacker to learn how code has been diversified.
READ LESS

Summary

Code diversification has been proposed as a technique to mitigate code reuse attacks, which have recently become the predominant way for attackers to exploit memory corruption vulnerabilities. As code reuse attacks require detailed knowledge of where code is in memory, diversification techniques attempt to mitigate these attacks by randomizing what...

READ MORE

Quantitative evaluation of dynamic platform techniques as a defensive mechanism

Published in:
RAID 2014: 17th Int. Symp. on Research in Attacks, Intrusions, and Defenses, 17-19 September 2014.

Summary

Cyber defenses based on dynamic platform techniques have been proposed as a way to make systems more resilient to attacks. These defenses change the properties of the platforms in order to make attacks more complicated. Unfortunately, little work has been done on measuring the effectiveness of these defenses. In this work, we first measure the protection provided by a dynamic platform technique on a testbed. The counter-intuitive results obtained from the testbed guide us in identifying and quantifying the major effects contributing to the protection in such a system. Based on the abstract effects, we develop a generalized model of dynamic platform techniques which can be used to quantify their effectiveness. To verify and validate out results, we simulate the generalized model and show that the testbed measurements and the simulations match with small amount of error. Finally, we enumerate a number of lessons learned in our work which can be applied to quantitative evaluation of other defensive techniques.
READ LESS

Summary

Cyber defenses based on dynamic platform techniques have been proposed as a way to make systems more resilient to attacks. These defenses change the properties of the platforms in order to make attacks more complicated. Unfortunately, little work has been done on measuring the effectiveness of these defenses. In this...

READ MORE

Computing on masked data: a high performance method for improving big data veracity

Published in:
HPEC 2014: IEEE Conf. on High Performance Extreme Computing, 9-11 September 2014.

Summary

The growing gap between data and users calls for innovative tools that address the challenges faced by big data volume, velocity and variety. Along with these standard three V's of big data, an emerging fourth "V" is veracity, which addresses the confidentiality, integrity, and availability of the data. Traditional cryptographic techniques that ensure the veracity of data can have overheads that are too large to apply to big data. This work introduces a new technique called Computing on Masked Data (CMD), which improves data veracity by allowing computations to be performed directly on masked data and ensuring that only authorized recipients can unmask the data. Using the sparse linear algebra of associative arrays, CMD can be performed with significantly less overhead than other approaches while still supporting a wide range of linear algebraic operations on the masked data. Databases with strong support of sparse operations, such as SciDB or Apache Accumulo, are ideally suited to this technique. Examples are shown for the application of CMD to a complex DNA matching algorithm and to database operations over social media data.
READ LESS

Summary

The growing gap between data and users calls for innovative tools that address the challenges faced by big data volume, velocity and variety. Along with these standard three V's of big data, an emerging fourth "V" is veracity, which addresses the confidentiality, integrity, and availability of the data. Traditional cryptographic...

READ MORE

A survey of cryptographic approaches to securing big-data analytics in the cloud

Published in:
HPEC 2014: IEEE Conf. on High Performance Extreme Computing, 9-11 September 2014.

Summary

The growing demand for cloud computing motivates the need to study the security of data received, stored, processed, and transmitted by a cloud. In this paper, we present a framework for such a study. We introduce a cloud computing model that captures a rich class of big-data use-cases and allows reasoning about relevant threats and security goals. We then survey three cryptographic techniques - homomorphic encryption, verifiable computation, and multi-party computation - that can be used to achieve these goals. We describe the cryptographic techniques in the context of our cloud model and highlight the differences in performance cost associated with each.
READ LESS

Summary

The growing demand for cloud computing motivates the need to study the security of data received, stored, processed, and transmitted by a cloud. In this paper, we present a framework for such a study. We introduce a cloud computing model that captures a rich class of big-data use-cases and allows...

READ MORE

A test-suite generator for database systems

Published in:
HPEC 2014: IEEE Conf. on High Performance Extreme Computing, 9-11 September 2014.

Summary

In this paper, we describe the SPAR Test Suite Generator (STSG), a new test-suite generator for SQL style database systems. This tool produced an entire test suite (data, queries, and ground-truth answers) as a unit and in response to a user's specification. Thus, database evaluators could use this tool to craft test suites for particular aspects of a specific database system. The inclusion of ground-truth answers in the produced test suite, furthermore, allowed this tool to support both benchmarking (at various scales) and correctness-checking in a repeatable way. Lastly, the test-suite generator of this document was extensively profiled and optimized, and was designed for test-time agility.
READ LESS

Summary

In this paper, we describe the SPAR Test Suite Generator (STSG), a new test-suite generator for SQL style database systems. This tool produced an entire test suite (data, queries, and ground-truth answers) as a unit and in response to a user's specification. Thus, database evaluators could use this tool to...

READ MORE

Big Data dimensional analysis

Published in:
HPEC 2014: IEEE Conf. on High Performance Extreme Computing, 9-11 September 2014.

Summary

The ability to collect and analyze large amounts of data is a growing problem within the scientific community. The growing gap between data and users calls for innovative tools that address the challenges faced by big data volume, velocity and variety. One of the main challenges associated with big data variety is automatically understanding the underlying structures and patterns of the data. Such an understanding is required as a pre-requisite to the application of advanced analytics to the data. Further, big data sets often contain anomalies and errors that are difficult to know a priori. Current approaches to understanding data structure are drawn from the traditional database ontology design. These approaches are effective, but often require too much human involvement to be effective for the volume, velocity and variety of data encountered by big data systems. Dimensional Data Analysis (DDA) is a proposed technique that allows big data analysts to quickly understand the overall structure of a big dataset, determine anomalies. DDA exploits structures that exist in a wide class of data to quickly determine the nature of the data and its statistical anomalies. DDA leverages existing schemas that are employed in big data databases today. This paper presents DDA, applies it to a number of data sets, and measures its performance. The overhead of DDA is low and can be applied to existing big data systems without greatly impacting their computing requirements.
READ LESS

Summary

The ability to collect and analyze large amounts of data is a growing problem within the scientific community. The growing gap between data and users calls for innovative tools that address the challenges faced by big data volume, velocity and variety. One of the main challenges associated with big data...

READ MORE

Achieving 100,000,000 database inserts per second using Accumulo and D4M

Summary

The Apache Accumulo database is an open source relaxed consistency database that is widely used for government applications. Accumulo is designed to deliver high performance on unstructured data such as graphs of network data. This paper tests the performance of Accumulo using data from the Graph500 benchmark. The Dynamic Distributed Dimensional Data Model (D4M) software is used to implement the benchmark on a 216-node cluster running the MIT SuperCloud software stack. A peak performance of over 100,000,000 database inserts per second was achieved which is 100x larger than the highest previously published value for any other database. The performance scales linearly with the number of ingest clients, number of database servers, and data size. The performance was achieved by adapting several supercomputing techniques to this application: distributed arrays, domain decomposition, adaptive load balancing, and single-program-multiple-data programming.
READ LESS

Summary

The Apache Accumulo database is an open source relaxed consistency database that is widely used for government applications. Accumulo is designed to deliver high performance on unstructured data such as graphs of network data. This paper tests the performance of Accumulo using data from the Graph500 benchmark. The Dynamic Distributed...

READ MORE

Genetic sequence matching using D4M big data approaches

Published in:
HPEC 2014: IEEE Conf. on High Performance Extreme Computing, 9-11 September 2014.

Summary

Recent technological advances in Next Generation Sequencing tools have led to increasing speeds of DNA sample collection, preparation, and sequencing. One instrument can produce over 600 Gb of genetic sequence data in a single run. This creates new opportunities to efficiently handle the increasing workload. We propose a new method of fast genetic sequence analysis using the Dynamic Distributed Dimensional Data Model (D4M) - an associative array environment for MATLAB developed at MIT Lincoln Laboratory. Based on mathematical and statistical properties, the method leverages big data techniques and the implementation of an Apache Acculumo database to accelerate computations one-hundred fold over other methods. Comparisons of the D4M method with the current gold-standard for sequence analysis, BLAST, show the two are comparable in the alignments they find. This paper will present an overview of the D4M genetic sequence algorithm and statistical comparisons with BLAST.
READ LESS

Summary

Recent technological advances in Next Generation Sequencing tools have led to increasing speeds of DNA sample collection, preparation, and sequencing. One instrument can produce over 600 Gb of genetic sequence data in a single run. This creates new opportunities to efficiently handle the increasing workload. We propose a new method...

READ MORE