Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Anton Lavrouk

What are Foundation Models Cooking in the Post-Soviet World?

Feb 25, 2025

Anton Lavrouk, Tarek Naous, Alan Ritter, Wei Xu

Figure 1 for What are Foundation Models Cooking in the Post-Soviet World?

Figure 2 for What are Foundation Models Cooking in the Post-Soviet World?

Figure 3 for What are Foundation Models Cooking in the Post-Soviet World?

Figure 4 for What are Foundation Models Cooking in the Post-Soviet World?

Abstract:The culture of the Post-Soviet states is complex, shaped by a turbulent history that continues to influence current events. In this study, we investigate the Post-Soviet cultural food knowledge of foundation models by constructing BORSch, a multimodal dataset encompassing 1147 and 823 dishes in the Russian and Ukrainian languages, centered around the Post-Soviet region. We demonstrate that leading models struggle to correctly identify the origins of dishes from Post-Soviet nations in both text-only and multimodal Question Answering (QA), instead over-predicting countries linked to the language the question is asked in. Through analysis of pretraining data, we show that these results can be explained by misleading dish-origin co-occurrences, along with linguistic phenomena such as Russian-Ukrainian code mixing. Finally, to move beyond QA-based assessments, we test models' abilities to produce accurate visual descriptions of dishes. The weak correlation between this task and QA suggests that QA alone may be insufficient as an evaluation of cultural understanding. To foster further research, we will make BORSch publicly available at https://github.com/alavrouk/BORSch.

Via

Access Paper or Ask Questions

Stanceosaurus 2.0: Classifying Stance Towards Russian and Spanish Misinformation

Feb 06, 2024

Anton Lavrouk, Ian Ligon, Tarek Naous, Jonathan Zheng, Alan Ritter, Wei Xu

Figure 1 for Stanceosaurus 2.0: Classifying Stance Towards Russian and Spanish Misinformation

Figure 2 for Stanceosaurus 2.0: Classifying Stance Towards Russian and Spanish Misinformation

Figure 3 for Stanceosaurus 2.0: Classifying Stance Towards Russian and Spanish Misinformation

Figure 4 for Stanceosaurus 2.0: Classifying Stance Towards Russian and Spanish Misinformation

Abstract:The Stanceosaurus corpus (Zheng et al., 2022) was designed to provide high-quality, annotated, 5-way stance data extracted from Twitter, suitable for analyzing cross-cultural and cross-lingual misinformation. In the Stanceosaurus 2.0 iteration, we extend this framework to encompass Russian and Spanish. The former is of current significance due to prevalent misinformation amid escalating tensions with the West and the violent incursion into Ukraine. The latter, meanwhile, represents an enormous community that has been largely overlooked on major social media platforms. By incorporating an additional 3,874 Spanish and Russian tweets over 41 misinformation claims, our objective is to support research focused on these issues. To demonstrate the value of this data, we employed zero-shot cross-lingual transfer on multilingual BERT, yielding results on par with the initial Stanceosaurus study with a macro F1 score of 43 for both languages. This underlines the viability of stance classification as an effective tool for identifying multicultural misinformation.

* WNUT2024

Via

Access Paper or Ask Questions