Abstract:The rapid dissemination of information through social media and the Internet has posed a significant challenge for fact-checking, among others in identifying check-worthy claims that fact-checkers should pay attention to, i.e. filtering claims needing fact-checking from a large pool of sentences. This challenge has stressed the need to focus on determining the priority of claims, specifically which claims are worth to be fact-checked. Despite advancements in this area in recent years, the application of large language models (LLMs), such as GPT, has only recently drawn attention in studies. However, many open-source LLMs remain underexplored. Therefore, this study investigates the application of eight prominent open-source LLMs with fine-tuning and prompt engineering to identify check-worthy statements from political transcriptions. Further, we propose a two-step data pruning approach to automatically identify high-quality training data instances for effective learning. The efficiency of our approach is demonstrated through evaluations on the English language dataset as part of the check-worthiness estimation task of CheckThat! 2024. Further, the experiments conducted with data pruning demonstrate that competitive performance can be achieved with only about 44\% of the training data. Our team ranked first in the check-worthiness estimation task in the English language.
Abstract:This paper discusses the effectiveness of leveraging Chatbot: Generative Pre-trained Transformer (ChatGPT) versions 3.5 and 4 for analyzing research papers for effective writing of scientific literature surveys. The study selected the \textit{Application of Artificial Intelligence in Breast Cancer Treatment} as the research topic. Research papers related to this topic were collected from three major publication databases Google Scholar, Pubmed, and Scopus. ChatGPT models were used to identify the category, scope, and relevant information from the research papers for automatic identification of relevant papers related to Breast Cancer Treatment (BCT), organization of papers according to scope, and identification of key information for survey paper writing. Evaluations performed using ground truth data annotated using subject experts reveal, that GPT-4 achieves 77.3\% accuracy in identifying the research paper categories and 50\% of the papers were correctly identified by GPT-4 for their scopes. Further, the results demonstrate that GPT-4 can generate reasons for its decisions with an average of 27\% new words, and 67\% of the reasons given by the model were completely agreeable to the subject experts.
Abstract:A prominent media topic in the UK in the early 2020s is the energy crisis affecting the UK and most of Europe. It brings into a single public debate issues of energy dependency and sustainability, fair distribution of economic burdens and cost of living, as well as climate change, risk, and sustainability. In this paper, we investigate the public discourse around the energy crisis and cost of living to identify how these pivotal and contradictory issues are reconciled in this debate and to identify which social actors are involved and the role they play. We analyse a document corpus retrieved from UK newspapers from January 2014 to March 2023. We apply a variety of natural language processing and data visualisation techniques to identify key topics, novel trends, critical social actors, and the role they play in the debate, along with the sentiment associated with those actors and topics. We combine automated techniques with manual discourse analysis to explore and validate the insights revealed in this study. The findings verify the utility of these techniques by providing a flexible and scalable pipeline for discourse analysis and providing critical insights for cost of living - energy crisis nexus research.
Abstract:Automated fact-checking has drawn considerable attention over the past few decades due to the increase in the diffusion of misinformation on online platforms. This is often carried out as a sequence of tasks comprising (i) the detection of sentences circulating in online platforms which constitute claims needing verification, followed by (ii) the verification process of those claims. This survey focuses on the former, by discussing existing efforts towards detecting claims needing fact-checking, with a particular focus on multilingual data and methods. This is a challenging and fertile direction where existing methods are yet far from matching human performance due to the profoundly challenging nature of the issue. Especially, the dissemination of information across multiple social platforms, articulated in multiple languages and modalities demands more generalized solutions for combating misinformation. Focusing on multilingual misinformation, we present a comprehensive survey of existing multilingual claim detection research. We present state-of-the-art multilingual claim detection research categorized into three key factors of the problem, verifiability, priority, and similarity. Further, we present a detailed overview of the existing multilingual datasets along with the challenges and suggest possible future advancements.
Abstract:The advancement of machine learning and symbolic approaches have underscored their strengths and weaknesses in Natural Language Processing (NLP). While machine learning approaches are powerful in identifying patterns in data, they often fall short in learning commonsense and the factual knowledge required for the NLP tasks. Meanwhile, the symbolic methods excel in representing knowledge-rich data. However, they struggle to adapt dynamic data and generalize the knowledge. Bridging these two paradigms through hybrid approaches enables the alleviation of weaknesses in both while preserving their strengths. Recent studies extol the virtues of this union, showcasing promising results in a wide range of NLP tasks. In this paper, we present an overview of hybrid approaches used for NLP. Specifically, we delve into the state-of-the-art hybrid approaches used for a broad spectrum of NLP tasks requiring natural language understanding, generation, and reasoning. Furthermore, we discuss the existing resources available for hybrid approaches for NLP along with the challenges, offering a roadmap for future directions.