Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Defne Altiok

Pitfalls of Conversational LLMs on News Debiasing

Apr 09, 2024

Ipek Baris Schlicht, Defne Altiok, Maryanne Taouk, Lucie Flek

Figure 1 for Pitfalls of Conversational LLMs on News Debiasing

Figure 2 for Pitfalls of Conversational LLMs on News Debiasing

Figure 3 for Pitfalls of Conversational LLMs on News Debiasing

Figure 4 for Pitfalls of Conversational LLMs on News Debiasing

Abstract:This paper addresses debiasing in news editing and evaluates the effectiveness of conversational Large Language Models in this task. We designed an evaluation checklist tailored to news editors' perspectives, obtained generated texts from three popular conversational models using a subset of a publicly available dataset in media bias, and evaluated the texts according to the designed checklist. Furthermore, we examined the models as evaluator for checking the quality of debiased model outputs. Our findings indicate that none of the LLMs are perfect in debiasing. Notably, some models, including ChatGPT, introduced unnecessary changes that may impact the author's style and create misinformation. Lastly, we show that the models do not perform as proficiently as domain experts in evaluating the quality of debiased outputs.

* The paper is accepted at the DELITE workshop which is co-located at COLING/LREC

Via

Access Paper or Ask Questions

DWReCO at CheckThat! 2023: Enhancing Subjectivity Detection through Style-based Data Sampling

Jul 07, 2023

Ipek Baris Schlicht, Lynn Khellaf, Defne Altiok

Abstract:This paper describes our submission for the subjectivity detection task at the CheckThat! Lab. To tackle class imbalances in the task, we have generated additional training materials with GPT-3 models using prompts of different styles from a subjectivity checklist based on journalistic perspective. We used the extended training set to fine-tune language-specific transformer models. Our experiments in English, German and Turkish demonstrate that different subjective styles are effective across all languages. In addition, we observe that the style-based oversampling is better than paraphrasing in Turkish and English. Lastly, the GPT-3 models sometimes produce lacklustre results when generating style-based texts in non-English languages.

* Accepted to CLEF CheckThat! Lab

Via

Access Paper or Ask Questions