Publications

Refine Results

(Filters Applied) Clear All

Secure input validation in Rust with parsing-expression grammars

Published in:
Thesis (M.E.)--Massachusetts Institute of Technology, 2019.

Summary

Accepting input from the outside world is one of the most dangerous things a system can do. Since type information is lost across system boundaries, systems must perform type-specific input handling routines to recover this information. Adversaries can carefully craft input data to exploit any bugs or vulnerabilities in these routines, thereby causing dangerous memory errors. Including input validation routines in kernels is especially risky. Sensitive memory contents and powerful privileges make kernels a preferred target of attackers. Furthermore, the fact that kernels must process user input, network data, as well as input from a wide array of peripheral devices means that including such input validation schemes is unavoidable. In this thesis we present Automatic Validation of Input Data (AVID), which helps solve the issue of input validation within kernels by automatically generating parser implementations for developer-defined structs. AVID leverages not only the unambiguity guarantees of parsing expression grammars but also the type safety guarantees of Rust. We show how AVID can be used to resolve a manufactured vulnerability in Tock, an operating system written in Rust for embedded systems. Using Rust’s procedural macro system, AVID generates parser implementations at compile time based on existing Rust struct definitions. AVID exposes a simple and convenient parser API that is able to validate input and then instantiate structs from the validated input. AVID's simple interface makes it easy for developers to use and to integrate with existing codebases.
READ LESS

Summary

Accepting input from the outside world is one of the most dangerous things a system can do. Since type information is lost across system boundaries, systems must perform type-specific input handling routines to recover this information. Adversaries can carefully craft input data to exploit any bugs or vulnerabilities in these...

READ MORE

Security and performance analysis of custom memory allocators

Author:
Published in:
Thesis (M.E.)--Massachusetts Institute of Technology, 2019.

Summary

Computer programmers use custom memory allocators as an alternative to built-in or general-purpose memory allocators with the intent to improve performance and minimize human error. However, it is difficult to achieve both memory safety and performance gains on custom memory allocators. In this thesis, we study the relationship between memory safety and custom allocators. We analyze three popular servers, Apache, Nginx, and Appweb, and show that while the performance benefits might exist in the unprotected version of the server, as soon as partial or full memory safety is enforced, the picture becomes much more complex. Based on the target, using a custom memory allocator might be faster, about the same, or slower than the system memory allocator. Another caveat is that custom memory allocation can only be protected partially (at the allocation granularity) without manual modification. In addition, custom memory allocators may also introduce additional vulnerabilities to an application (e.g., OpenSSL Heartbleed). We thus conclude that using custom memory allocators is very nuanced, and that the challenges they pose may outweigh the small performance gains in the unprotected mode in many cases. Our findings suggest that developers must carefully consider the trade-offs and caveats of using a custom memory allocator before deploying it in their project.
READ LESS

Summary

Computer programmers use custom memory allocators as an alternative to built-in or general-purpose memory allocators with the intent to improve performance and minimize human error. However, it is difficult to achieve both memory safety and performance gains on custom memory allocators. In this thesis, we study the relationship between memory...

READ MORE

Rulemaking for insider threat mitigation

Published in:
Chapter 12, Cyber Resilience of Systems and Networks, 2019, pp. 265-86.

Summary

This chapter continues the topic we started to discuss in the previous chapter – the human factors. However, it focuses on a specific method of enhancing cyber resilience via establishing appropriate rules for employees of an organization under consideration. Such rules aim at reducing threats from, for example, current or former employees, contractors, and business partners who intentionally use their authorized access to an organization to harm the organization. System users can also unintentionally contribute to cyber-attacks, or themselves become a passive target of a cyber-attack. The implementation of work-related rules is intended to decrease such risks. However, rules implementation can also increase the risks that arise from employee disregard for rules. This can occur when the rules become too restrictive, and employees become more likely to disregard the rules. Furthermore, the more often employees disregard the rules both intentionally and unintentionally, the more likely insider threats are able to observe and mimic employee behavior. This chapter shows how to find an intermediate, optimal collection of rules between the two extremes of "too many rules" and "not enough rules."
READ LESS

Summary

This chapter continues the topic we started to discuss in the previous chapter – the human factors. However, it focuses on a specific method of enhancing cyber resilience via establishing appropriate rules for employees of an organization under consideration. Such rules aim at reducing threats from, for example, current or...

READ MORE

Detecting food safety risks and human trafficking using interpretable machine learning methods

Author:
Published in:
Thesis (M.S.)--Massachusetts Institute of Technology, 2019.

Summary

Black box machine learning methods have allowed researchers to design accurate models using large amounts of data at the cost of interpretability. Model interpretability not only improves user buy-in, but in many cases provides users with important information. Especially in the case of the classification problems addressed in this thesis, the ideal model should not only provide accurate predictions, but should also inform users of how features affect the results. My research goal is to solve real-world problems and compare how different classification models affect the outcomes and interpretability. To this end, this thesis is divided into two parts: food safety risk analysis and human trafficking detection. The first half analyzes the characteristics of supermarket suppliers in China that indicate a high risk of food safety violations. Contrary to expectations, supply chain dispersion, internal inspections, and quality certification systems are not found to be predictive of food safety risk in our data. The second half focuses on identifying human trafficking, specifically sex trafficking, advertisements hidden amongst online classified escort service advertisements. We propose a novel but interpretable keyword detection and modeling pipeline that is more accurate and actionable than current neural network approaches. The algorithms and applications presented in this thesis succeed in providing users with not just classifications but also the characteristics that indicate food safety risk and human trafficking ads.
READ LESS

Summary

Black box machine learning methods have allowed researchers to design accurate models using large amounts of data at the cost of interpretability. Model interpretability not only improves user buy-in, but in many cases provides users with important information. Especially in the case of the classification problems addressed in this thesis...

READ MORE

Detection and characterization of human trafficking networks using unsupervised scalable text template matching

Summary

Human trafficking is a form of modern-day slavery affecting an estimated 40 million victims worldwide, primarily through the commercial sexual exploitation of women and children. In the last decade, the advertising of victims has moved from the streets to websites on the Internet, providing greater efficiency and anonymity for sex traffickers. This shift has allowed traffickers to list their victims in multiple geographic areas simultaneously, while also improving operational security by using multiple methods of electronic communication with buyers; complicating the ability of law enforcement to disrupt these illicit organizations. In this paper, we address this issue and present a novel unsupervised and scalable template matching algorithm for analyzing and detecting complex organizations operating on adult service websites. The algorithm uses only the advertisement content to uncover signature patterns in text that are indicative of organized activities and organizational structure. We apply this method to a large corpus of adult service advertisements retrieved from backpage.com, and show that the networks identified through the algorithm match well with surrogate truth data derived from phone number networks in the same corpus. Further exploration of the results show that the proposed method provides deeper insights into the complex structures of sex trafficking organizations, not possible through networks derived from phone numbers alone. This method provides a powerful new capability for law enforcement to more completely identify and gather evidence about trafficking networks and their operations.
READ LESS

Summary

Human trafficking is a form of modern-day slavery affecting an estimated 40 million victims worldwide, primarily through the commercial sexual exploitation of women and children. In the last decade, the advertising of victims has moved from the streets to websites on the Internet, providing greater efficiency and anonymity for sex...

READ MORE

Leveraging Intel SGX technology to protect security-sensitive applications

Published in:
17th IEEE Int. Symp. on Network Computing and Applications, NCA, 1-3 November 2018.

Summary

This paper explains the process by which Intel Software Guard Extensions (SGX) can be leveraged into an existing codebase to protect a security-sensitive application. Intel SGX provides user-level applications with hardware-enforced confidentiality and integrity protections and incurs manageable impact on performance. These protections apply to all three phases of the operational data lifecycle: at rest, in use, and in transit. SGX shrinks the trusted computing base (and therefore the attack surface) of the application to only the hardware on the CPU chip and the portion of the application's software that is executed within the protected enclave. The SDK enables SGX integration into existing C/C++ codebases while still ensuring program support for legacy and non-Intel platforms. This paper is the first published work to walk through the step-by-step process of Intel SGX integration with examples and performance results from an actual cryptographic application produced in a standard Linux development environment.
READ LESS

Summary

This paper explains the process by which Intel Software Guard Extensions (SGX) can be leveraged into an existing codebase to protect a security-sensitive application. Intel SGX provides user-level applications with hardware-enforced confidentiality and integrity protections and incurs manageable impact on performance. These protections apply to all three phases of the...

READ MORE

OS independent and hardware-assisted insider threat detection and prevention framework

Summary

Governmental and military institutions harbor critical infrastructure and highly confidential information. Although institutions are investing a lot for protecting their data and assets from possible outsider attacks, insiders are still a distrustful source of information leakage. As malicious software injection is one among many attacks, turning innocent employees into malicious attackers through social attacks is the most impactful one. Malicious insiders or uneducated employees are dangerous for organizations that they are already behind the perimeter protections that guard the digital assets; actually, they are trojans on their own. For an insider, the easiest possible way for creating a hole in security is using the popular and ubiquitous Universal Serial Bus (USB) devices due to its versatile and easy to use plug-and-play nature. USB type storage devices are the biggest threats for contaminating mission critical infrastructure with viruses, malware, and trojans. USB human interface devices are also dangerous as they may connect to a host with destructive hidden functionalities. In this paper, we propose a novel hardware-assisted insider threat detection and prevention framework for the USB case. Our novel framework is also OS independent. We implemented a proof-of-concept design on an FPGA board which is widely used in military settings supporting critical missions, and demonstrated the results considering different experiments. Based on the results of these experiments, we show that our framework can identify rapid-keyboard key-stroke attacks and can easily detect the functionality of the USB device plugged in. We present the resource consumption of our framework on the FPGA for its utilization on a host controller device. We show that our hard-to-tamper framework introduces no overhead in USB communication in terms of user experience.
READ LESS

Summary

Governmental and military institutions harbor critical infrastructure and highly confidential information. Although institutions are investing a lot for protecting their data and assets from possible outsider attacks, insiders are still a distrustful source of information leakage. As malicious software injection is one among many attacks, turning innocent employees into malicious...

READ MORE

Cross-app poisoning in software-defined networking

Published in:
Proc. ACM Conf. on Computer and Communications Security, CCS, 15-18 October 2018, pp. 648-63.

Summary

Software-defined networking (SDN) continues to grow in popularity because of its programmable and extensible control plane realized through network applications (apps). However, apps introduce significant security challenges that can systemically disrupt network operations, since apps must access or modify data in a shared control plane state. If our understanding of how such data propagate within the control plane is inadequate, apps can co-opt other apps, causing them to poison the control plane's integrity. We present a class of SDN control plane integrity attacks that we call cross-app poisoning (CAP), in which an unprivileged app manipulates the shared control plane state to trick a privileged app into taking actions on its behalf. We demonstrate how role-based access control (RBAC) schemes are insufficient for preventing such attacks because they neither track information flow nor enforce information flow control (IFC). We also present a defense, ProvSDN, that uses data provenance to track information flow and serves as an online reference monitor to prevent CAP attacks. We implement ProvSDN on the ONOS SDN controller and demonstrate that information flow can be tracked with low-latency overheads.
READ LESS

Summary

Software-defined networking (SDN) continues to grow in popularity because of its programmable and extensible control plane realized through network applications (apps). However, apps introduce significant security challenges that can systemically disrupt network operations, since apps must access or modify data in a shared control plane state. If our understanding of...

READ MORE

Designing secure and resilient embedded avionics systems

Summary

With an increased reliance on Unmanned Aerial Systems (UAS) as mission assets and the dependency of UAS on cyber resources, cyber security of UAS must be improved by adopting sound security principles and relevant technologies from the computing community. On the other hand, the traditional avionics community, being aware of the importance of cyber security, is looking at new architecture and designs that can accommodate both the safety oriented principles as well as the cyber security principles and techniques. The Air Force Research Laboratories (AFRL) Information Directorate has created the Agile Resilient Embedded System (ARES) program to investigate mitigations that offer a method to "design-in" cyber protections while maintaining mission assurance. ARES specifically seeks to 'build security in' for unmanned aerial vehicles incorporating security and hardening best practices, while inserting resilience as a system attribute to maintain a level of system operation despite successful exploitation of residual vulnerabilities.
READ LESS

Summary

With an increased reliance on Unmanned Aerial Systems (UAS) as mission assets and the dependency of UAS on cyber resources, cyber security of UAS must be improved by adopting sound security principles and relevant technologies from the computing community. On the other hand, the traditional avionics community, being aware of...

READ MORE

Hyperscaling internet graph analysis with D4M on the MIT SuperCloud

Summary

Detecting anomalous behavior in network traffic is a major challenge due to the volume and velocity of network traffic. For example, a 10 Gigabit Ethernet connection can generate over 50 MB/s of packet headers. For global network providers, this challenge can be amplified by many orders of magnitude. Development of novel computer network traffic analytics requires: high level programming environments, massive amount of packet capture (PCAP) data, and diverse data products for "at scale" algorithm pipeline development. D4M (Dynamic Distributed Dimensional Data Model) combines the power of sparse linear algebra, associative arrays, parallel processing, and distributed databases (such as SciDB and Apache Accumulo) to provide a scalable data and computation system that addresses the big data problems associated with network analytics development. Combining D4M with the MIT SuperCloud manycore processors and parallel storage system enables network analysts to interactively process massive amounts of data in minutes. To demonstrate these capabilities, we have implemented a representative analytics pipeline in D4M and benchmarked it on 96 hours of Gigabit PCAP data with MIT SuperCloud. The entire pipeline from uncompressing the raw files to database ingest was implemented in 135 lines of D4M code and achieved speedups of over 20,000.
READ LESS

Summary

Detecting anomalous behavior in network traffic is a major challenge due to the volume and velocity of network traffic. For example, a 10 Gigabit Ethernet connection can generate over 50 MB/s of packet headers. For global network providers, this challenge can be amplified by many orders of magnitude. Development of...

READ MORE