Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:PACS: A Dataset for Physical Audiovisual CommonSense Reasoning

Mar 21, 2022

Samuel Yu, Peter Wu, Paul Pu Liang, Ruslan Salakhutdinov, Louis-Philippe Morency

Figure 1 for PACS: A Dataset for Physical Audiovisual CommonSense Reasoning

Figure 2 for PACS: A Dataset for Physical Audiovisual CommonSense Reasoning

Figure 3 for PACS: A Dataset for Physical Audiovisual CommonSense Reasoning

Figure 4 for PACS: A Dataset for Physical Audiovisual CommonSense Reasoning

Share this with someone who'll enjoy it:

Abstract:In order for AI to be safely deployed in real-world scenarios such as hospitals, schools, and the workplace, they should be able to reason about the physical world by understanding the physical properties and affordances of available objects, how they can be manipulated, and how they interact with other physical objects. This research field of physical commonsense reasoning is fundamentally a multi-sensory task since physical properties are manifested through multiple modalities, two of them being vision and acoustics. Our paper takes a step towards real-world physical commonsense reasoning by contributing PACS: the first audiovisual benchmark annotated for physical commonsense attributes. PACS contains a total of 13,400 question-answer pairs, involving 1,377 unique physical commonsense questions and 1,526 videos. Our dataset provides new opportunities to advance the research field of physical reasoning by bringing audio as a core component of this multimodal problem. Using PACS, we evaluate multiple state-of-the-art models on this new challenging task. While some models show promising results (70% accuracy), they all fall short of human performance (95% accuracy). We conclude the paper by demonstrating the importance of multimodal reasoning and providing possible avenues for future research.

* 38 pages, 22 figures

View paper on

Share this with someone who'll enjoy it:

Title:PACS: A Dataset for Physical Audiovisual CommonSense Reasoning

Paper and Code