Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Elizaveta Kuznetsova

Fact-checking with Generative AI: A Systematic Cross-Topic Examination of LLMs Capacity to Detect Veracity of Political Information

Mar 11, 2025

Elizaveta Kuznetsova, Ilaria Vitulano, Mykola Makhortykh, Martha Stolze, Tomas Nagy, Victoria Vziatysheva

Abstract:The purpose of this study is to assess how large language models (LLMs) can be used for fact-checking and contribute to the broader debate on the use of automated means for veracity identification. To achieve this purpose, we use AI auditing methodology that systematically evaluates performance of five LLMs (ChatGPT 4, Llama 3 (70B), Llama 3.1 (405B), Claude 3.5 Sonnet, and Google Gemini) using prompts regarding a large set of statements fact-checked by professional journalists (16,513). Specifically, we use topic modeling and regression analysis to investigate which factors (e.g. topic of the prompt or the LLM type) affect evaluations of true, false, and mixed statements. Our findings reveal that while ChatGPT 4 and Google Gemini achieved higher accuracy than other models, overall performance across models remains modest. Notably, the results indicate that models are better at identifying false statements, especially on sensitive topics such as COVID-19, American political controversies, and social issues, suggesting possible guardrails that may enhance accuracy on these topics. The major implication of our findings is that there are significant challenges for using LLMs for factchecking, including significant variation in performance across different LLMs and unequal quality of outputs for specific topics which can be attributed to deficits of training data. Our research highlights the potential and limitations of LLMs in political fact-checking, suggesting potential avenues for further improvements in guardrails as well as fine-tuning.

* 15 pages, 2 figures

Via

Access Paper or Ask Questions

Search engines in polarized media environment: Auditing political information curation on Google and Bing prior to 2024 US elections

Jan 08, 2025

Mykola Makhortykh, Tobias Rorhbach, Maryna Sydorova, Elizaveta Kuznetsova

Figure 1 for Search engines in polarized media environment: Auditing political information curation on Google and Bing prior to 2024 US elections

Figure 2 for Search engines in polarized media environment: Auditing political information curation on Google and Bing prior to 2024 US elections

Figure 3 for Search engines in polarized media environment: Auditing political information curation on Google and Bing prior to 2024 US elections

Figure 4 for Search engines in polarized media environment: Auditing political information curation on Google and Bing prior to 2024 US elections

Abstract:Search engines play an important role in the context of modern elections. By curating information in response to user queries, search engines influence how individuals are informed about election-related developments and perceive the media environment in which elections take place. It has particular implications for (perceived) polarization, especially if search engines' curation results in a skewed treatment of information sources based on their political leaning. Until now, however, it is unclear whether such a partisan gap emerges through information curation on search engines and what user- and system-side factors affect it. To address this shortcoming, we audit the two largest Western search engines, Google and Bing, prior to the 2024 US presidential elections and examine how these search engines' organic search results and additional interface elements represent election-related information depending on the queries' slant, user location, and time when the search was conducted. Our findings indicate that both search engines tend to prioritize left-leaning media sources, with the exact scope of search results' ideological slant varying between Democrat- and Republican-focused queries. We also observe limited effects of location- and time-based factors on organic search results, whereas results for additional interface elements were more volatile over time and specific US states. Together, our observations highlight that search engines' information curation actively mirrors the partisan divides present in the US media environments and has the potential to contribute to (perceived) polarization within these environments.

* 38 pages

Via

Access Paper or Ask Questions

Algorithmically Curated Lies: How Search Engines Handle Misinformation about US Biolabs in Ukraine

Jan 24, 2024

Elizaveta Kuznetsova, Mykola Makhortykh, Maryna Sydorova, Aleksandra Urman, Ilaria Vitulano, Martha Stolze

Abstract:The growing volume of online content prompts the need for adopting algorithmic systems of information curation. These systems range from web search engines to recommender systems and are integral for helping users stay informed about important societal developments. However, unlike journalistic editing the algorithmic information curation systems (AICSs) are known to be subject to different forms of malperformance which make them vulnerable to possible manipulation. The risk of manipulation is particularly prominent in the case when AICSs have to deal with information about false claims that underpin propaganda campaigns of authoritarian regimes. Using as a case study of the Russian disinformation campaign concerning the US biolabs in Ukraine, we investigate how one of the most commonly used forms of AICSs - i.e. web search engines - curate misinformation-related content. For this aim, we conduct virtual agent-based algorithm audits of Google, Bing, and Yandex search outputs in June 2022. Our findings highlight the troubling performance of search engines. Even though some search engines, like Google, were less likely to return misinformation results, across all languages and locations, the three search engines still mentioned or promoted a considerable share of false content (33% on Google; 44% on Bing, and 70% on Yandex). We also find significant disparities in misinformation exposure based on the language of search, with all search engines presenting a higher number of false stories in Russian. Location matters as well with users from Germany being more likely to be exposed to search results promoting false information. These observations stress the possibility of AICSs being vulnerable to manipulation, in particular in the case of the unfolding propaganda campaigns, and underline the importance of monitoring performance of these systems to prevent it.

* 19 pages, 5 figures

Via

Access Paper or Ask Questions

In Generative AI we Trust: Can Chatbots Effectively Verify Political Information?

Dec 20, 2023

Elizaveta Kuznetsova, Mykola Makhortykh, Victoria Vziatysheva, Martha Stolze, Ani Baghumyan, Aleksandra Urman

Abstract:This article presents a comparative analysis of the ability of two large language model (LLM)-based chatbots, ChatGPT and Bing Chat, recently rebranded to Microsoft Copilot, to detect veracity of political information. We use AI auditing methodology to investigate how chatbots evaluate true, false, and borderline statements on five topics: COVID-19, Russian aggression against Ukraine, the Holocaust, climate change, and LGBTQ+ related debates. We compare how the chatbots perform in high- and low-resource languages by using prompts in English, Russian, and Ukrainian. Furthermore, we explore the ability of chatbots to evaluate statements according to political communication concepts of disinformation, misinformation, and conspiracy theory, using definition-oriented prompts. We also systematically test how such evaluations are influenced by source bias which we model by attributing specific claims to various political and social actors. The results show high performance of ChatGPT for the baseline veracity evaluation task, with 72 percent of the cases evaluated correctly on average across languages without pre-training. Bing Chat performed worse with a 67 percent accuracy. We observe significant disparities in how chatbots evaluate prompts in high- and low-resource languages and how they adapt their evaluations to political communication concepts with ChatGPT providing more nuanced outputs than Bing Chat. Finally, we find that for some veracity detection-related tasks, the performance of chatbots varied depending on the topic of the statement or the source to which it is attributed. These findings highlight the potential of LLM-based chatbots in tackling different forms of false information in online environments, but also points to the substantial variation in terms of how such potential is realized due to specific factors, such as language of the prompt or the topic.

* 22 pages, 8 figures

Via

Access Paper or Ask Questions