Exploiting morphological, grammatical, and semantic correlates for improved text difficulty assessment

June 26, 2014

Conference Paper

Author:

Elizabeth E. Salesky

…

Wade Shen

Published in:

Proc. 9th Workshop on Innovative Use of NLP for Building Educational Applications, 26 June 2014, pp. 155-162.

R&D Area:

Cyber Security and Information Sciences

R&D Group:

Artificial Intelligence Technology and Systems

Exploiting morphological, grammatical, and semantic correlates for improved text difficulty assessment

Summary

We present a low-resource, language-independent system for text difficulty assessment. We replicate and improve upon a baseline by Shen et al. (2013) on the Interagency Language Roundtable (ILR) scale. Our work demonstrates that the addition of morphological, information theoretic, and language modeling features to a traditional readability baseline greatly benefits our performance. We use the Margin-Infused Relaxed Algorithm and Support Vector Machines for experiments on Arabic, Dari, English, and Pashto, and provide a detailed analysis of our results.