Publications

Refine Results

(Filters Applied) Clear All

Improving long-text authorship verification via model selection and data tuning

Published in:
Proc. 7th Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature, LaTeCH-CLfL2023, 5 May 2023, pp. 28-37.

Summary

Authorship verification is used to link texts written by the same author without needing a model per author, making it useful for deanonymizing users spreading text with malicious intent. Recent advances in Transformer-based language models hold great promise for author verification, though short context lengths and non-diverse training regimes present challenges for their practical application. In this work, we investigate the effect of these challenges in the application of a Cross-Encoder Transformer-based author verification system under multiple conditions. We perform experiments with four Transformer backbones using differently tuned variants of fanfiction data and found that our BigBird pipeline outperformed Longformer, RoBERTa, and ELECTRA and performed competitively against the official top ranked system from the PAN evaluation. We also examined the effect of authors and fandoms not seen in training on model performance. Through this, we found fandom has the greatest influence on true trials, pairs of text written by the same author, and that a balanced training dataset in terms of class and fandom performed the most consistently.
READ LESS

Summary

Authorship verification is used to link texts written by the same author without needing a model per author, making it useful for deanonymizing users spreading text with malicious intent. Recent advances in Transformer-based language models hold great promise for author verification, though short context lengths and non-diverse training regimes present...

READ MORE

Showing Results

1-1 of 1