Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Radu Ionescu

Transferring BERT-like Transformers' Knowledge for Authorship Verification

Dec 09, 2021

Andrei Manolache, Florin Brad, Elena Burceanu, Antonio Barbalau, Radu Ionescu, Marius Popescu

Figure 1 for Transferring BERT-like Transformers' Knowledge for Authorship Verification

Figure 2 for Transferring BERT-like Transformers' Knowledge for Authorship Verification

Figure 3 for Transferring BERT-like Transformers' Knowledge for Authorship Verification

Figure 4 for Transferring BERT-like Transformers' Knowledge for Authorship Verification

Abstract:The task of identifying the author of a text spans several decades and was tackled using linguistics, statistics, and, more recently, machine learning. Inspired by the impressive performance gains across a broad range of natural language processing tasks and by the recent availability of the PAN large-scale authorship dataset, we first study the effectiveness of several BERT-like transformers for the task of authorship verification. Such models prove to achieve very high scores consistently. Next, we empirically show that they focus on topical clues rather than on author writing style characteristics, taking advantage of existing biases in the dataset. To address this problem, we provide new splits for PAN-2020, where training and test data are sampled from disjoint topics or authors. Finally, we introduce DarkReddit, a dataset with a different input data distribution. We further use it to analyze the domain generalization performance of models in a low-data regime and how performance varies when using the proposed PAN-2020 splits for fine-tuning. We show that those splits can enhance the models' capability to transfer knowledge over a new, significantly different dataset.

* 16 pages, 3 figures

Via

Access Paper or Ask Questions

Challenges in Representation Learning: A report on three machine learning contests

Jul 01, 2013

Ian J. Goodfellow, Dumitru Erhan, Pierre Luc Carrier, Aaron Courville, Mehdi Mirza, Ben Hamner, Will Cukierski, Yichuan Tang, David Thaler, Dong-Hyun Lee(+18 more)

Figure 1 for Challenges in Representation Learning: A report on three machine learning contests

Figure 2 for Challenges in Representation Learning: A report on three machine learning contests

Abstract:The ICML 2013 Workshop on Challenges in Representation Learning focused on three challenges: the black box learning challenge, the facial expression recognition challenge, and the multimodal learning challenge. We describe the datasets created for these challenges and summarize the results of the competitions. We provide suggestions for organizers of future challenges and some comments on what kind of knowledge can be gained from machine learning competitions.

* 8 pages, 2 figures

Via

Access Paper or Ask Questions