Publications

Refine Results

(Filters Applied) Clear All

Exploiting morphological, grammatical, and semantic correlates for improved text difficulty assessment

Author:
Published in:
Proc. 9th Workshop on Innovative Use of NLP for Building Educational Applications, 26 June 2014, pp. 155-162.

Summary

We present a low-resource, language-independent system for text difficulty assessment. We replicate and improve upon a baseline by Shen et al. (2013) on the Interagency Language Roundtable (ILR) scale. Our work demonstrates that the addition of morphological, information theoretic, and language modeling features to a traditional readability baseline greatly benefits our performance. We use the Margin-Infused Relaxed Algorithm and Support Vector Machines for experiments on Arabic, Dari, English, and Pashto, and provide a detailed analysis of our results.
READ LESS

Summary

We present a low-resource, language-independent system for text difficulty assessment. We replicate and improve upon a baseline by Shen et al. (2013) on the Interagency Language Roundtable (ILR) scale. Our work demonstrates that the addition of morphological, information theoretic, and language modeling features to a traditional readability baseline greatly benefits...

READ MORE

Audio-visual identity grounding for enabling cross media search

Author:
Published in:
IEEE Computer Vision and Pattern Recognition Big Data Workshop, 23 June 2014.

Summary

Automatically searching for media clips in large heterogeneous datasets is an inherently difficult challenge, and nearly impossibly so when searching across distinct media types (e.g. finding audio clips that match an image). In this paper we introduce the exploitation of identity grounding for enabling this cross media search and exploration capability. Through the use of grounding we leverage one media channel (e.g. visual identity) as a noisy label for training a model in a different channel (e.g. audio speaker model). Finally, we demonstrate this search capability using images from the Labeled Faces in the Wild (LFW) dataset to query audio files that have been extracted from the YouTube Faces (YTF) dataset.
READ LESS

Summary

Automatically searching for media clips in large heterogeneous datasets is an inherently difficult challenge, and nearly impossibly so when searching across distinct media types (e.g. finding audio clips that match an image). In this paper we introduce the exploitation of identity grounding for enabling this cross media search and exploration...

READ MORE

A new multiple choice comprehension test for MT

Published in:
Automatic and Manual Metrics for Operation Translation Evaluation Workshop, 9th Int. Conf. on Language Resources and Evaluation (LREC 2014), 26 May 2014.

Summary

We present results from a new machine translation comprehension test, similar to those developed in previous work (Jones et al., 2007). This test has documents in four conditions: (1) original English documents; (2) human translations of the documents into Arabic; conditions (3) and (4) are machine translations of the Arabic documents into English from two different MT systems. We created two forms of the test: Form A has the original English documents and output from the two Arabic-to-English MT systems. Form B has English, Arabic, and one of the MT system outputs. We administered the comprehension test to three subject types recruited in the greater Boston area: (1) native English speakers with no Arabic skills, (2) Arabic language learners, and (3) Native Arabic speakers who also have English language skills. There were 36 native English speakers, 13 Arabic learners, and 11 native Arabic speakers with English skills. Subjects needed an average of 3.8 hours to complete the test, which had 191 questions and 59 documents. Native English speakers with no Arabic skills saw Form A. Arabic learners and native Arabic speakers saw form B.
READ LESS

Summary

We present results from a new machine translation comprehension test, similar to those developed in previous work (Jones et al., 2007). This test has documents in four conditions: (1) original English documents; (2) human translations of the documents into Arabic; conditions (3) and (4) are machine translations of the Arabic...

READ MORE

Standardized ILR-based and task-based speech-to-speech MT evaluation

Published in:
Automatic and Manual Metrics for Operation Translation Evaluation Workshop, 9th Int. Conf. on Language Resources and Evaluation (LREC 2014), 26 May 2014.

Summary

This paper describes a new method for task-based speech-to-speech machine translation evaluation, in which tasks are defined and assessed according to independent published standards, both for the military tasks performed and for the foreign language skill levels used. We analyze task success rates and automatic MT evaluation scores (BLEU and METEOR) for 220 role-play dialogs. Each role-play team consisted of one native English-speaking soldier role player, one native Pashto-speaking local national role player, and one Pashto/English interpreter. The overall PASS score, averaged over all of the MT dialogs, was 44%. The average PASS rate for HT was 95%, which is important because a PASS requires that the role-players know the tasks. Without a high PASS rate in the HT condition, we could not be sure that the MT condition was not being unfairly penalized. We learned that success rates depended as much on task simplicity as it did upon the translation condition: 67% of simple, base-case scenarios were successfully completed using MT, whereas only 35% of contrasting scenarios with even minor obstacles received passing scores. We observed that MT had the greatest chance of success when the task was simple and the language complexity needs were low.
READ LESS

Summary

This paper describes a new method for task-based speech-to-speech machine translation evaluation, in which tasks are defined and assessed according to independent published standards, both for the military tasks performed and for the foreign language skill levels used. We analyze task success rates and automatic MT evaluation scores (BLEU and...

READ MORE

Development and use of a comprehensive humanitarian assessment tool in post-earthquake Haiti

Summary

This paper describes a comprehensive humanitarian assessment tool designed and used following the January 2010 Haiti earthquake. The tool was developed under Joint Task Force -- Haiti coordination using indicators of humanitarian needs to support decision making by the United States Government, agencies of the United Nations, and various non-governmental organizations. A set of questions and data collection methodology were developed by a collaborative process involving a broad segment of the Haiti humanitarian relief community and used to conduct surveys in internally displaced person settlements and surrounding communities for a four-month period starting on 15 March 2010. Key considerations in the development of the assessment tool and data collection methodology, representative analysis results, and observations from the operational use of the tool for decision making are reported. The paper concludes with lessons learned and recommendations for design and use of similar tools in the future.
READ LESS

Summary

This paper describes a comprehensive humanitarian assessment tool designed and used following the January 2010 Haiti earthquake. The tool was developed under Joint Task Force -- Haiti coordination using indicators of humanitarian needs to support decision making by the United States Government, agencies of the United Nations, and various non-governmental...

READ MORE

Robust keys from physical unclonable functions

Published in:
Proc. 2014 IEEE Int. Symp. on Hardware-Oriented Security and Trust, HOST, 6-7 May 2014.

Summary

Weak physical unclonable functions (PUFs) can instantiate read-proof hardware tokens (Tuyls et al. 2006, CHES) where benign variation, such as changing temperature, yields a consistent key, but invasive attempts to learn the key destroy it. Previous approaches evaluate security by measuring how much an invasive attack changes the derived key (Pappu et al. 2002, Science). If some attack insufficiently changes the derived key, an expert must redesign the hardware. An unexplored alternative uses software to enhance token response to known physical attacks. Our approach draws on machine learning. We propose a variant of linear discriminant analysis (LDA), called PUF LDA, which reduces noise levels in PUF instances while enhancing changes from known attacks. We compare PUF LDA with standard techniques using an optical coating PUF and the following feature types: raw pixels, fast Fourier transform, short-time Fourier transform, and wavelets. We measure the true positive rate for valid detection at a 0% false positive rate (no mistakes on samples taken after an attack). PUF LDA improves the true positive rate from 50% on average (with a large variance across PUFs) to near 100%. While a well-designed physical process is irreplaceable, PUF LDA enables system designers to improve the PUF reliability-security tradeoff by incorporating attacks without redesigning the hardware token.
READ LESS

Summary

Weak physical unclonable functions (PUFs) can instantiate read-proof hardware tokens (Tuyls et al. 2006, CHES) where benign variation, such as changing temperature, yields a consistent key, but invasive attempts to learn the key destroy it. Previous approaches evaluate security by measuring how much an invasive attack changes the derived key...

READ MORE

Spectral subgraph detection with corrupt observations

Published in:
Proc. IEEE Int. Conf. on Acoustics, Speech and Signal Processing, ICASSP, 4-9 May 2014.

Summary

Recent work on signal detection in graph-based data focuses on classical detection when the signal and noise are both in the form of discrete entities and their relationships. In practice, the relationships of interest may not be directly observable, or may be observed through a noisy mechanism. The effects of imperfect observations add another layer of difficulty to the detection problem, beyond the effects of typical random fluctuations in the background graph. This paper analyzes the impact on detection performance of several error and corruption mechanisms for graph data. In relatively simple scenarios, the change in signal and noise power is analyzed, and this is demonstrated empirically in more complicated models. It is shown that, with enough side information, it is possible to fully recover performance equivalent to working with uncorrupted data using a Bayesian approach, and a simpler cost-optimization approach is shown to provide a substantial benefit as well.
READ LESS

Summary

Recent work on signal detection in graph-based data focuses on classical detection when the signal and noise are both in the form of discrete entities and their relationships. In practice, the relationships of interest may not be directly observable, or may be observed through a noisy mechanism. The effects of...

READ MORE

Adaptive attacker strategy development against moving target cyber defenses

Summary

A model of strategy formulation is used to study how an adaptive attacker learns to overcome a moving target cyber defense. The attacker-defender interaction is modeled as a game in which a defender deploys a temporal platform migration defense. Against this defense, a population of attackers develop strategies specifying the temporal ordering of resource investments that bring targeted zero-day exploits into existence. Attacker response to two defender temporal platform migration scheduling policies are examined. In the first defender scheduling policy, the defender selects the active platform in each match uniformly at random from a pool of available platforms. In the second policy the defender schedules each successive platform to maximize the diversity of the source code presented to the attacker. Adaptive attacker response strategies are modeled by finite state machine (FSM) constructs that evolve during simulated play against defender strategies via an evolutionary algorithm. It is demonstrated that the attacker learns to invest heavily in exploit creation for the platform with the least similarity to other platforms when faced with a diversity defense, while avoiding investment in exploits for this least similar platform when facing a randomization defense. Additionally, it is demonstrated that the diversity-maximizing defense is superior for shorter duration attacker-defender engagements, but performs sub-optimally in extended attacker-defender interactions.
READ LESS

Summary

A model of strategy formulation is used to study how an adaptive attacker learns to overcome a moving target cyber defense. The attacker-defender interaction is modeled as a game in which a defender deploys a temporal platform migration defense. Against this defense, a population of attackers develop strategies specifying the...

READ MORE

LuminoCity: a 3D printed, illuminated city generated from LADAR data

Published in:
TePRA 2014: IEEE Int. Conf. on Tech. for Practical Robot Appl., 14-15 April 2014.

Summary

In this work, we describe LuminoCity, a novel three-dimensional data display. A 3D printed model of Cambridge, MA was generated from LADAR data. A translucent plastic model was then cast from a mold of the 3D printed model. We developed a display system to project data onto the translucent model, and we can project a wide range of analyses onto the city, including satellite imagery and network traffic.
READ LESS

Summary

In this work, we describe LuminoCity, a novel three-dimensional data display. A 3D printed model of Cambridge, MA was generated from LADAR data. A translucent plastic model was then cast from a mold of the 3D printed model. We developed a display system to project data onto the translucent model...

READ MORE

Strategic evolution of adversaries against temporal platform diversity active cyber defenses

Published in:
2014 Spring Simulation Multi-Confernece, SpringSim 2014, 13-16 April 2014.

Summary

Adversarial dynamics are a critical facet within the cyber security domain, in which there exists a co-evolution between attackers and defenders in any given threat scenario. While defenders leverage capabilities to minimize the potential impact of an attack, the adversary is simultaneously developing countermeasures to the observed defenses. In this study, we develop a set of tools to model the adaptive strategy formulation of an intelligent actor against an active cyber defensive system. We encode strategies as binary chromosomes representing finite state machines that evolve according to Holland's genetic algorithm. We study the strategic considerations including overall actor reward balanced against the complexity of the determined strategies. We present a series of simulation results demonstrating the ability to automatically search a large strategy space for optimal resultant fitness against a variety of counter-strategies.
READ LESS

Summary

Adversarial dynamics are a critical facet within the cyber security domain, in which there exists a co-evolution between attackers and defenders in any given threat scenario. While defenders leverage capabilities to minimize the potential impact of an attack, the adversary is simultaneously developing countermeasures to the observed defenses. In this...

READ MORE