Publications

Refine Results

(Filters Applied) Clear All

Using a power law distribution to describe big data

Published in:
HPEC 2015: IEEE Conf. on High Performance Extreme Computing, 15-17 September 2015.

Summary

The gap between data production and user ability to access, compute and produce meaningful results calls for tools that address the challenges associated with big data volume, velocity and variety. One of the key hurdles is the inability to methodically remove expected or uninteresting elements from large data sets. This difficulty often wastes valuable researcher and computational time by expending resources on uninteresting parts of data. Social sensors, or sensors which produce data based on human activity, such as Wikipedia, Twitter, and Facebook have an underlying structure which can be thought of as having a Power Law distribution. Such a distribution implies that few nodes generate large amounts of data. In this article, we propose a technique to take an arbitrary dataset and compute a power law distributed background model that bases its parameters on observed statistics. This model can be used to determine the suitability of using a power law or automatically identify high degree nodes for filtering and can be scaled to work with big data.
READ LESS

Summary

The gap between data production and user ability to access, compute and produce meaningful results calls for tools that address the challenges associated with big data volume, velocity and variety. One of the key hurdles is the inability to methodically remove expected or uninteresting elements from large data sets. This...

READ MORE

Iris biometric security challenges and possible solutions: for your eyes only? Using the iris as a key

Summary

Biometrics were originally developed for identification, such as for criminal investigations. More recently, biometrics have been also utilized for authentication. Most biometric authentication systems today match a user's biometric reading against a stored reference template generated during enrollment. If the reading and the template are sufficiently close, the authentication is considered successful and the user is authorized to access protected resources. This binary matching approach has major inherent vulnerabilities. An alternative approach to biometric authentication proposes to use fuzzy extractors (also known as biometric cryptosystems), which derive cryptographic keys from noisy sources, such as biometrics. In theory, this approach is much more robust and can enable cryptographic authorization. Unfortunately, for many biometrics that provide high-quality identification, fuzzy extractors provide no security guarantees. This gap arises in part because of an objective mismatch. The quality of a biometric identification is typically measured using false match rate (FMR) versus false nonmatch rate (FNMR). As a result, biometrics have been extensively optimized for this metric. However, this metric says little about the suitability of a biometric for key derivation. In this article, we illustrate a metric that can be used to optimize biometrics for authentication. Using iris biometrics as an example, we explore possible directions for improving processing and representation according to this metric. Finally, we discuss why strong biometric authentication remains a challenging problem and propose some possible future directions for addressing these challenges.
READ LESS

Summary

Biometrics were originally developed for identification, such as for criminal investigations. More recently, biometrics have been also utilized for authentication. Most biometric authentication systems today match a user's biometric reading against a stored reference template generated during enrollment. If the reading and the template are sufficiently close, the authentication is...

READ MORE

Missing the point(er): on the effectiveness of code pointer integrity

Summary

Memory corruption attacks continue to be a major vector of attack for compromising modern systems. Numerous defenses have been proposed against memory corruption attacks, but they all have their limitations and weaknesses. Stronger defenses such as complete memory safety for legacy languages (C/C++) incur a large overhead, while weaker ones such as practical control flow integrity have been shown to be ineffective. A recent technique called code pointer integrity (CPI) promises to balance security and performance by focusing memory safety on code pointers thus preventing most control-hijacking attacks while maintaining low overhead. CPI protects access to code pointers by storing them in a safe region that is protected by instruction level isolation. On x86-32, this isolation is enforced by hardware; on x86-64 and ARM, isolation is enforced by information hiding. We show that, for architectures that do not support segmentation in which CPI relies on information hiding, CPI's safe region can be leaked and then maliciously modified by using data pointer overwrites. We implement a proof-of-concept exploit against Nginx and successfully bypass CPI implementations that rely on information hiding in 6 seconds with 13 observed crashes. We also present an attack that generates no crashes and is able to bypass CPI in 98 hours. Our attack demonstrates the importance of adequately protecting secrets in security mechanisms and the dangers of relying on difficulty of guessing without guaranteeing the absence of memory leaks.
READ LESS

Summary

Memory corruption attacks continue to be a major vector of attack for compromising modern systems. Numerous defenses have been proposed against memory corruption attacks, but they all have their limitations and weaknesses. Stronger defenses such as complete memory safety for legacy languages (C/C++) incur a large overhead, while weaker ones...

READ MORE

Unifying leakage classes: simulatable leakage and pseudoentropy

Published in:
8th Int. Conf. Information-Theoretic Security (ICITS 2015), 2-5 May 2015 in Lecture Notes in Computer Science (LNCS), Vol. 9063, 2015, pp. 69-86.

Summary

Leakage resilient cryptography designs systems to withstand partial adversary knowledge of secret state. Ideally, leakage-resilient systems withstand current and future attacks; restoring confidence in the security of implemented cryptographic systems. Understanding the relation between classes of leakage functions is an important aspect. In this work, we consider the memory leakage model, where the leakage class contains functions over the system's entire secret state. Standard limitations include functions over the system's entire secret state. Standard limitations include functions with bounded output length, functions that retain (pseudo) entropy in the secret, and functions that leave the secret computationally unpredictable. Standaert, Pereira, and Yu (Crypto, 2013) introduced a new class of leakage functions they call simulatable leakage. A leakage function is simulatable if a simulator can produce indistinguishable leakage without access to the true secret state. We extend their notion to general applications and consider two versions. For weak simulatability: the simulated leakage must be indistinguishable from the true leakage in the presence of public information. For strong simulatability, this requirement must also hold when the distinguisher has access to the true secret state. We show the following: --Weakly simulatable functions retain computational unpredictability. --Strongly simulatability functions retain pseudoentropy. --There are bounded length functions that are not weakly simulatable. --There are weakly simulatable functions that remove pseudoentropy. --There are leakage functions that retain computational unpredictability are not weakly simulatable.
READ LESS

Summary

Leakage resilient cryptography designs systems to withstand partial adversary knowledge of secret state. Ideally, leakage-resilient systems withstand current and future attacks; restoring confidence in the security of implemented cryptographic systems. Understanding the relation between classes of leakage functions is an important aspect. In this work, we consider the memory leakage...

READ MORE

Computing on Masked Data to improve the security of big data

Published in:
HST 2015, IEEE Int. Conf. on Technologies for Homeland Security, 14-16 April 2015.

Summary

Organizations that make use of large quantities of information require the ability to store and process data from central locations so that the product can be shared or distributed across a heterogeneous group of users. However, recent events underscore the need for improving the security of data stored in such untrusted servers or databases. Advances in cryptographic techniques and database technologies provide the necessary security functionality but rely on a computational model in which the cloud is used solely for storage and retrieval. Much of big data computation and analytics make use of signal processing fundamentals for computation. As the trend of moving data storage and computation to the cloud increases, homeland security missions should understand the impact of security on key signal processing kernels such as correlation or thresholding. In this article, we propose a tool called Computing on Masked Data (CMD), which combines advances in database technologies and cryptographic tools to provide a low overhead mechanism to offload certain mathematical operations securely to the cloud. This article describes the design and development of the CMD tool.
READ LESS

Summary

Organizations that make use of large quantities of information require the ability to store and process data from central locations so that the product can be shared or distributed across a heterogeneous group of users. However, recent events underscore the need for improving the security of data stored in such...

READ MORE

Rapid sequence identification of potential pathogens using techniques from sparse linear algebra

Summary

The decreasing costs and increasing speed and accuracy of DNA sample collection, preparation, and sequencing has rapidly produced an enormous volume of genetic data. However, fast and accurate analysis of the samples remains a bottleneck. Here we present D4RAGenS, a genetic sequence identification algorithm that exhibits the Big Data handling and computational power of the Dynamic Distributed Dimensional Data Model (D4M). The method leverages linear algebra and statistical properties to increase computational performance while retaining accuracy by subsampling the data. Two run modes, Fast and Wise, yield speed and precision tradeoffs, with applications in biodefense and medical diagnostics. The D4RAGenS analysis algorithm is tested over several datasets, including three utilized for the Defense Threat Reduction Agency (DTRA) metagenomic algorithm contest.
READ LESS

Summary

The decreasing costs and increasing speed and accuracy of DNA sample collection, preparation, and sequencing has rapidly produced an enormous volume of genetic data. However, fast and accurate analysis of the samples remains a bottleneck. Here we present D4RAGenS, a genetic sequence identification algorithm that exhibits the Big Data handling...

READ MORE

Cryptographically secure computation

Published in:
Computer, Vol. 48, No. 4, April 2015, pp. 78-81.

Summary

Researchers are making secure multiparty computation--a cryptographic technique that enables information sharing and analysis while keeping sensitive inputs secret--faster and easier to use for application software developers.
READ LESS

Summary

Researchers are making secure multiparty computation--a cryptographic technique that enables information sharing and analysis while keeping sensitive inputs secret--faster and easier to use for application software developers.

READ MORE

HEtest: a homomorphic encryption testing framework

Published in:
3rd Workshop on Encrypted Computing and Applied Homomorphic Cryptography (WAHC 2015), 30 January 2015.

Summary

In this work, we present a generic open-source software framework that can evaluate the correctness and performance of homomorphic encryption software. Our framework, called HEtest, automates the entire process of a test: generation of data for testing (such as circuits and inputs), execution of a test, comparison of performance to an insecure baseline, statistical analysis of the test results, and production of a LaTeX report. To illustrate the capability of our framework, we present a case study of our analysis of the open-source HElib homomorphic encryption software. We stress though that HEtest is written in a modular fashion, so it can easily be adapted to test any homomorphic encryption software.
READ LESS

Summary

In this work, we present a generic open-source software framework that can evaluate the correctness and performance of homomorphic encryption software. Our framework, called HEtest, automates the entire process of a test: generation of data for testing (such as circuits and inputs), execution of a test, comparison of performance to...

READ MORE

Using a big data database to identify pathogens in protein data space [e-print]

Summary

Current metagenomic analysis algorithms require significant computing resources, can report excessive false positives (type I errors), may miss organisms (type II errors/false negatives), or scale poorly on large datasets. This paper explores using big data database technologies to characterize very large metagenomic DNA sequences in protein space, with the ultimate goal of rapid pathogen identification in patient samples. Our approach uses the abilities of a big data databases to hold large sparse associative array representations of genetic data to extract statistical patterns about the data that can be used in a variety of ways to improve identification algorithms.
READ LESS

Summary

Current metagenomic analysis algorithms require significant computing resources, can report excessive false positives (type I errors), may miss organisms (type II errors/false negatives), or scale poorly on large datasets. This paper explores using big data database technologies to characterize very large metagenomic DNA sequences in protein space, with the ultimate...

READ MORE

Automated assessment of secure search systems

Summary

This work presents the results of a three-year project that assessed nine different privacy-preserving data search systems. We detail the design of a software assessment framework that focuses on low system footprint, repeatability, and reusability. A unique achievement of this project was the automation and integration of the entire test process, from the production and execution of tests to the generation of human-readable evaluation reports. We synthesize our experiences into a set of simple mantras that we recommend following in the design of any assessment framework.
READ LESS

Summary

This work presents the results of a three-year project that assessed nine different privacy-preserving data search systems. We detail the design of a software assessment framework that focuses on low system footprint, repeatability, and reusability. A unique achievement of this project was the automation and integration of the entire test...

READ MORE