Abstract:Socially intelligent AI that can understand and interact seamlessly with humans in daily lives is increasingly important as AI becomes more closely integrated with peoples' daily activities. However, current works in artificial social reasoning all rely on language-only, or language-dominant approaches to benchmark and training models, resulting in systems that are improving in verbal communication but struggle with nonverbal social understanding. To address this limitation, we tap into a novel source of data rich in nonverbal and social interactions -- mime videos. Mimes refer to the art of expression through gesture and movement without spoken words, which presents unique challenges and opportunities in interpreting non-verbal social communication. We contribute a new dataset called MimeQA, obtained by sourcing 221 videos from YouTube, through rigorous annotation and verification, resulting in a benchmark with 101 videos and 806 question-answer pairs. Using MimeQA, we evaluate state-of-the-art video large language models (vLLMs) and find that their overall accuracy ranges from 15-30%. Our analysis reveals that vLLMs often fail to ground imagined objects and over-rely on the text prompt while ignoring subtle nonverbal interactions. Our data resources are released at https://github.com/MIT-MI/MimeQA to inspire future work in foundation models that embody true social intelligence capable of interpreting non-verbal human interactions.
Abstract:As the field of explainable AI (XAI) is maturing, calls for interactive explanations for (the outputs of) AI models are growing, but the state-of-the-art predominantly focuses on static explanations. In this paper, we focus instead on interactive explanations framed as conflict resolution between agents (i.e. AI models and/or humans) by leveraging on computational argumentation. Specifically, we define Argumentative eXchanges (AXs) for dynamically sharing, in multi-agent systems, information harboured in individual agents' quantitative bipolar argumentation frameworks towards resolving conflicts amongst the agents. We then deploy AXs in the XAI setting in which a machine and a human interact about the machine's predictions. We identify and assess several theoretical properties characterising AXs that are suitable for XAI. Finally, we instantiate AXs for XAI by defining various agent behaviours, e.g. capturing counterfactual patterns of reasoning in machines and highlighting the effects of cognitive biases in humans. We show experimentally (in a simulated environment) the comparative advantages of these behaviours in terms of conflict resolution, and show that the strongest argument may not always be the most effective.