Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:HintsOfTruth: A Multimodal Checkworthiness Detection Dataset with Real and Synthetic Claims

Feb 17, 2025

Michiel van der Meer, Pavel Korshunov, Sébastien Marcel, Lonneke van der Plas

Figure 1 for HintsOfTruth: A Multimodal Checkworthiness Detection Dataset with Real and Synthetic Claims

Figure 2 for HintsOfTruth: A Multimodal Checkworthiness Detection Dataset with Real and Synthetic Claims

Figure 3 for HintsOfTruth: A Multimodal Checkworthiness Detection Dataset with Real and Synthetic Claims

Figure 4 for HintsOfTruth: A Multimodal Checkworthiness Detection Dataset with Real and Synthetic Claims

Share this with someone who'll enjoy it:

Abstract:Misinformation can be countered with fact-checking, but the process is costly and slow. Identifying checkworthy claims is the first step, where automation can help scale fact-checkers' efforts. However, detection methods struggle with content that is 1) multimodal, 2) from diverse domains, and 3) synthetic. We introduce HintsOfTruth, a public dataset for multimodal checkworthiness detection with $27$K real-world and synthetic image/claim pairs. The mix of real and synthetic data makes this dataset unique and ideal for benchmarking detection methods. We compare fine-tuned and prompted Large Language Models (LLMs). We find that well-configured lightweight text-based encoders perform comparably to multimodal models but the first only focus on identifying non-claim-like content. Multimodal LLMs can be more accurate but come at a significant computational cost, making them impractical for large-scale applications. When faced with synthetic data, multimodal models perform more robustly

View paper on

Share this with someone who'll enjoy it:

Title:HintsOfTruth: A Multimodal Checkworthiness Detection Dataset with Real and Synthetic Claims

Paper and Code