A language-independent approach to automatic text difficulty assessment for second-language learners

August 4, 2013

Conference Paper

Author:

Wade Shen

…

Published in:

Proc. 2nd Workshop on Predicting and Improving Text Readability for Target Reader Populations, 4-9 August 2013.

R&D Area:

Cyber Security and Information Sciences

R&D Group:

Artificial Intelligence Technology and Systems

A language-independent approach to automatic text difficulty assessment for second-language learners

Summary

In this paper we introduce a new baseline for language-independent text difficulty assessment applied to the Interagency Language Roundtable (ILR) proficiency scale. We demonstrate that reading level assessment is a discriminative problem that is best-suited for regression. Our baseline uses z-normalized shallow length features and TF-LOG weighted vectors on bag-of-words for Arabic, Dari, English, and Pashto. We compare Support Vector Machines and the Margin-Infused Relaxed Algorithm measured by mean squared error. We provide an analysis of which features are most predictive of a given level.