Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Erika Mori

Leveraging LLMs with Iterative Loop Structure for Enhanced Social Intelligence in Video Question Answering

Mar 27, 2025

Erika Mori, Yue Qiu, Hirokatsu Kataoka, Yoshimitsu Aoki

Figure 1 for Leveraging LLMs with Iterative Loop Structure for Enhanced Social Intelligence in Video Question Answering

Figure 2 for Leveraging LLMs with Iterative Loop Structure for Enhanced Social Intelligence in Video Question Answering

Figure 3 for Leveraging LLMs with Iterative Loop Structure for Enhanced Social Intelligence in Video Question Answering

Figure 4 for Leveraging LLMs with Iterative Loop Structure for Enhanced Social Intelligence in Video Question Answering

Abstract:Social intelligence, the ability to interpret emotions, intentions, and behaviors, is essential for effective communication and adaptive responses. As robots and AI systems become more prevalent in caregiving, healthcare, and education, the demand for AI that can interact naturally with humans grows. However, creating AI that seamlessly integrates multiple modalities, such as vision and speech, remains a challenge. Current video-based methods for social intelligence rely on general video recognition or emotion recognition techniques, often overlook the unique elements inherent in human interactions. To address this, we propose the Looped Video Debating (LVD) framework, which integrates Large Language Models (LLMs) with visual information, such as facial expressions and body movements, to enhance the transparency and reliability of question-answering tasks involving human interaction videos. Our results on the Social-IQ 2.0 benchmark show that LVD achieves state-of-the-art performance without fine-tuning. Furthermore, supplementary human annotations on existing datasets provide insights into the model's accuracy, guiding future improvements in AI-driven social intelligence.

Via

Access Paper or Ask Questions