A fun and engaging interface for crowdsourcing named entities

May 23, 2016

Conference Paper

Author:

Kara B. Greenfield

…

Published in:

10th Language Resources and Evaluation Conf., LREC 2016, 23-28 May 2016.

R&D Area:

Cyber Security and Information Sciences

R&D Group:

Artificial Intelligence Technology and Systems

A fun and engaging interface for crowdsourcing named entities

Summary

There are many current problems in natural language processing that are best solved by training algorithms on an annotated in-language, in-domain corpus. The more representative the training corpus is of the test data, the better the algorithm will perform, but also the less likely it is that such a corpus has already been annotated. Annotating corpora for natural language processing tasks is typically a time consuming and expensive process. In this paper, we provide a case study in using crowd sourcing to curate an in-domain corpus for named entity recognition, a common problem in natural language processing. In particular, we present our use of fun, engaging user interfaces as a way to entice workers to partake in our crowd sourcing task while avoiding inflating our payments in a way that would attract more mercenary workers than conscientious ones. Additionally, we provide a survey of alternate interfaces for collecting annotations of named entities and compare our approach to those systems.