Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Petar Ivanov

SemEval-2024 Task 8: Multidomain, Multimodel and Multilingual Machine-Generated Text Detection

Apr 22, 2024

Yuxia Wang, Jonibek Mansurov, Petar Ivanov, Jinyan Su, Artem Shelmanov, Akim Tsvigun, Osama Mohammed Afzal, Tarek Mahmoud, Giovanni Puccetti, Thomas Arnold(+5 more)

Figure 1 for SemEval-2024 Task 8: Multidomain, Multimodel and Multilingual Machine-Generated Text Detection

Figure 2 for SemEval-2024 Task 8: Multidomain, Multimodel and Multilingual Machine-Generated Text Detection

Figure 3 for SemEval-2024 Task 8: Multidomain, Multimodel and Multilingual Machine-Generated Text Detection

Figure 4 for SemEval-2024 Task 8: Multidomain, Multimodel and Multilingual Machine-Generated Text Detection

Abstract:We present the results and the main findings of SemEval-2024 Task 8: Multigenerator, Multidomain, and Multilingual Machine-Generated Text Detection. The task featured three subtasks. Subtask A is a binary classification task determining whether a text is written by a human or generated by a machine. This subtask has two tracks: a monolingual track focused solely on English texts and a multilingual track. Subtask B is to detect the exact source of a text, discerning whether it is written by a human or generated by a specific LLM. Subtask C aims to identify the changing point within a text, at which the authorship transitions from human to machine. The task attracted a large number of participants: subtask A monolingual (126), subtask A multilingual (59), subtask B (70), and subtask C (30). In this paper, we present the task, analyze the results, and discuss the system submissions and the methods they used. For all subtasks, the best systems used LLMs.

* Proceedings of the 18th International Workshop on Semantic Evaluation (SemEval-2024)
* 23 pages, 12 tables

Via

Access Paper or Ask Questions

M4GT-Bench: Evaluation Benchmark for Black-Box Machine-Generated Text Detection

Feb 17, 2024

Yuxia Wang, Jonibek Mansurov, Petar Ivanov, Jinyan Su, Artem Shelmanov, Akim Tsvigun, Osama Mohanned Afzal, Tarek Mahmoud, Giovanni Puccetti, Thomas Arnold(+4 more)

Figure 1 for M4GT-Bench: Evaluation Benchmark for Black-Box Machine-Generated Text Detection

Figure 2 for M4GT-Bench: Evaluation Benchmark for Black-Box Machine-Generated Text Detection

Figure 3 for M4GT-Bench: Evaluation Benchmark for Black-Box Machine-Generated Text Detection

Figure 4 for M4GT-Bench: Evaluation Benchmark for Black-Box Machine-Generated Text Detection

Abstract:The advent of Large Language Models (LLMs) has brought an unprecedented surge in machine-generated text (MGT) across diverse channels. This raises legitimate concerns about its potential misuse and societal implications. The need to identify and differentiate such content from genuine human-generated text is critical in combating disinformation, preserving the integrity of education and scientific fields, and maintaining trust in communication. In this work, we address this problem by introducing a new benchmark involving multilingual, multi-domain and multi-generator for MGT detection -- M4GT-Bench. It is collected for three task formulations: (1) mono-lingual and multi-lingual binary MGT detection; (2) multi-way detection identifies which particular model generates the text; and (3) human-machine mixed text detection, where a word boundary delimiting MGT from human-written content should be determined. Human evaluation for Task 2 shows less than random guess performance, demonstrating the challenges to distinguish unique LLMs. Promising results always occur when training and test data distribute within the same domain or generators.

* 28 pages

Via

Access Paper or Ask Questions

M4: Multi-generator, Multi-domain, and Multi-lingual Black-Box Machine-Generated Text Detection

May 24, 2023

Yuxia Wang, Jonibek Mansurov, Petar Ivanov, Jinyan Su, Artem Shelmanov, Akim Tsvigun, Chenxi Whitehouse, Osama Mohammed Afzal, Tarek Mahmoud, Alham Fikri Aji(+1 more)

Figure 1 for M4: Multi-generator, Multi-domain, and Multi-lingual Black-Box Machine-Generated Text Detection

Figure 2 for M4: Multi-generator, Multi-domain, and Multi-lingual Black-Box Machine-Generated Text Detection

Figure 3 for M4: Multi-generator, Multi-domain, and Multi-lingual Black-Box Machine-Generated Text Detection

Figure 4 for M4: Multi-generator, Multi-domain, and Multi-lingual Black-Box Machine-Generated Text Detection

Abstract:Large language models (LLMs) have demonstrated remarkable capability to generate fluent responses to a wide variety of user queries, but this has also resulted in concerns regarding the potential misuse of such texts in journalism, educational, and academic context. In this work, we aim to develop automatic systems to identify machine-generated text and to detect potential misuse. We first introduce a large-scale benchmark M4, which is multi-generator, multi-domain, and multi-lingual corpus for machine-generated text detection. Using the dataset, we experiment with a number of methods and we show that it is challenging for detectors to generalize well on unseen examples if they are either from different domains or are generated by different large language models. In such cases, detectors tend to misclassify machine-generated text as human-written. These results show that the problem is far from solved and there is a lot of room for improvement. We believe that our dataset M4, which covers different generators, domains and languages, will enable future research towards more robust approaches for this pressing societal problem. The M4 dataset is available at https://github.com/mbzuai-nlp/M4.

* 11 pages

Via

Access Paper or Ask Questions