Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Artem Abzaliev

CVQA: Culturally-diverse Multilingual Visual Question Answering Benchmark

Jun 10, 2024

David Romero, Chenyang Lyu, Haryo Akbarianto Wibowo, Teresa Lynn, Injy Hamed, Aditya Nanda Kishore, Aishik Mandal, Alina Dragonetti, Artem Abzaliev, Atnafu Lambebo Tonja(+65 more)

Figure 1 for CVQA: Culturally-diverse Multilingual Visual Question Answering Benchmark

Figure 2 for CVQA: Culturally-diverse Multilingual Visual Question Answering Benchmark

Figure 3 for CVQA: Culturally-diverse Multilingual Visual Question Answering Benchmark

Figure 4 for CVQA: Culturally-diverse Multilingual Visual Question Answering Benchmark

Abstract:Visual Question Answering (VQA) is an important task in multimodal AI, and it is often used to test the ability of vision-language models to understand and reason on knowledge present in both visual and textual data. However, most of the current VQA models use datasets that are primarily focused on English and a few major world languages, with images that are typically Western-centric. While recent efforts have tried to increase the number of languages covered on VQA datasets, they still lack diversity in low-resource languages. More importantly, although these datasets often extend their linguistic range via translation or some other approaches, they usually keep images the same, resulting in narrow cultural representation. To address these limitations, we construct CVQA, a new Culturally-diverse multilingual Visual Question Answering benchmark, designed to cover a rich set of languages and cultures, where we engage native speakers and cultural experts in the data collection process. As a result, CVQA includes culturally-driven images and questions from across 28 countries on four continents, covering 26 languages with 11 scripts, providing a total of 9k questions. We then benchmark several Multimodal Large Language Models (MLLMs) on CVQA, and show that the dataset is challenging for the current state-of-the-art models. This benchmark can serve as a probing evaluation suite for assessing the cultural capability and bias of multimodal models and hopefully encourage more research efforts toward increasing cultural awareness and linguistic diversity in this field.

Via

Access Paper or Ask Questions

Towards Dog Bark Decoding: Leveraging Human Speech Processing for Automated Bark Classification

Apr 29, 2024

Artem Abzaliev, Humberto Pérez Espinosa, Rada Mihalcea

Abstract:Similar to humans, animals make extensive use of verbal and non-verbal forms of communication, including a large range of audio signals. In this paper, we address dog vocalizations and explore the use of self-supervised speech representation models pre-trained on human speech to address dog bark classification tasks that find parallels in human-centered tasks in speech recognition. We specifically address four tasks: dog recognition, breed identification, gender classification, and context grounding. We show that using speech embedding representations significantly improves over simpler classification baselines. Further, we also find that models pre-trained on large human speech acoustics can provide additional performance boosts on several tasks.

* to be published in LREC-COLING 2024

Via

Access Paper or Ask Questions

DELPHI: Data for Evaluating LLMs' Performance in Handling Controversial Issues

Nov 07, 2023

David Q. Sun, Artem Abzaliev, Hadas Kotek, Zidi Xiu, Christopher Klein, Jason D. Williams

Figure 1 for DELPHI: Data for Evaluating LLMs' Performance in Handling Controversial Issues

Figure 2 for DELPHI: Data for Evaluating LLMs' Performance in Handling Controversial Issues

Figure 3 for DELPHI: Data for Evaluating LLMs' Performance in Handling Controversial Issues

Figure 4 for DELPHI: Data for Evaluating LLMs' Performance in Handling Controversial Issues

Abstract:Controversy is a reflection of our zeitgeist, and an important aspect to any discourse. The rise of large language models (LLMs) as conversational systems has increased public reliance on these systems for answers to their various questions. Consequently, it is crucial to systematically examine how these models respond to questions that pertaining to ongoing debates. However, few such datasets exist in providing human-annotated labels reflecting the contemporary discussions. To foster research in this area, we propose a novel construction of a controversial questions dataset, expanding upon the publicly released Quora Question Pairs Dataset. This dataset presents challenges concerning knowledge recency, safety, fairness, and bias. We evaluate different LLMs using a subset of this dataset, illuminating how they handle controversial issues and the stances they adopt. This research ultimately contributes to our understanding of LLMs' interaction with controversial issues, paving the way for improvements in their comprehension and handling of complex societal debates.

* Accepted to EMNLP Industry Track 2023

Via

Access Paper or Ask Questions

A PhD Student's Perspective on Research in NLP in the Era of Very Large Language Models

May 21, 2023

Oana Ignat, Zhijing Jin, Artem Abzaliev, Laura Biester, Santiago Castro, Naihao Deng, Xinyi Gao, Aylin Gunal, Jacky He, Ashkan Kazemi(+12 more)

Abstract:Recent progress in large language models has enabled the deployment of many generative NLP applications. At the same time, it has also led to a misleading public discourse that ``it's all been solved.'' Not surprisingly, this has in turn made many NLP researchers -- especially those at the beginning of their career -- wonder about what NLP research area they should focus on. This document is a compilation of NLP research directions that are rich for exploration, reflecting the views of a diverse group of PhD students in an academic research lab. While we identify many research areas, many others exist; we do not cover those areas that are currently addressed by LLMs but where LLMs lag behind in performance, or those focused on LLM development. We welcome suggestions for other research directions to include: https://bit.ly/nlp-era-llm

Via

Access Paper or Ask Questions

Adaptable Claim Rewriting with Offline Reinforcement Learning for Effective Misinformation Discovery

Oct 14, 2022

Ashkan Kazemi, Artem Abzaliev, Naihao Deng, Rui Hou, Davis Liang, Scott A. Hale, Verónica Pérez-Rosas, Rada Mihalcea

Figure 1 for Adaptable Claim Rewriting with Offline Reinforcement Learning for Effective Misinformation Discovery

Figure 2 for Adaptable Claim Rewriting with Offline Reinforcement Learning for Effective Misinformation Discovery

Figure 3 for Adaptable Claim Rewriting with Offline Reinforcement Learning for Effective Misinformation Discovery

Figure 4 for Adaptable Claim Rewriting with Offline Reinforcement Learning for Effective Misinformation Discovery

Abstract:We propose a novel system to help fact-checkers formulate search queries for known misinformation claims and effectively search across multiple social media platforms. We introduce an adaptable rewriting strategy, where editing actions (e.g., swap a word with its synonym; change verb tense into present simple) for queries containing claims are automatically learned through offline reinforcement learning. Specifically, we use a decision transformer to learn a sequence of editing actions that maximize query retrieval metrics such as mean average precision. Through several experiments, we show that our approach can increase the effectiveness of the queries by up to 42\% relatively, while producing editing action sequences that are human readable, thus making the system easy to use and explain.

Via

Access Paper or Ask Questions