A language-independent approach to automatic text difficulty assessment for second-language learners
August 4, 2013
Conference Paper
Author:
Published in:
Proc. 2nd Workshop on Predicting and Improving Text Readability for Target Reader Populations, 4-9 August 2013.
R&D Area:
A language-independent approach to automatic text difficulty assessment for second-language learners
Summary
In this paper we introduce a new baseline for language-independent text difficulty assessment applied to the Interagency Language Roundtable (ILR) proficiency scale. We demonstrate that reading level assessment is a discriminative problem that is best-suited for regression. Our baseline uses z-normalized shallow length features and TF-LOG weighted vectors on bag-of-words for Arabic, Dari, English, and Pashto. We compare Support Vector Machines and the Margin-Infused Relaxed Algorithm measured by mean squared error. We provide an analysis of which features are most predictive of a given level.