Content+context=classification: examining the roles of social interactions and linguist content in Twitter user classification

August 24, 2014

Conference Paper

Author:

William M. Campbell

…

Kara B. Greenfield

Published in:

Proc. Second Workshop on Natural Language Processing for Social Media, SocialNLP, 24 August 2014, pp. 59-65.

R&D Area:

Cyber Security and Information Sciences

R&D Group:

Artificial Intelligence Technology and Systems

Content+context=classification: examining the roles of social interactions and linguist content in Twitter user classification

Summary

Twitter users demonstrate many characteristics via their online presence. Connections, community memberships, and communication patterns reveal both idiosyncratic and general properties of users. In addition, the content of tweets can be critical for distinguishing the role and importance of a user. In this work, we explore Twitter user classification using context and content cues. We construct a rich graph structure induced by hashtags and social communications in Twitter. We derive features from this graph structure - centrality, communities, and local flow of information. In addition, we perform detailed content analysis on tweets looking at offensiveness and topics. We then examine user classification and the role of feature types (context, content) and learning methods (propositional, relational) through a series of experiments on annotated data. Our work contrasts with prior approaches in that we use relational learning and alternative, non-specialized feature sets. Our goal is to understand how both content and context are predictive of user characteristics. Experiments demonstrate that the best performance for user classification uses relational learning with varying content and context features.