Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Towards A Unified Neural Architecture for Visual Recognition and Reasoning

Nov 10, 2023

Calvin Luo, Boqing Gong, Ting Chen, Chen Sun

Figure 1 for Towards A Unified Neural Architecture for Visual Recognition and Reasoning

Figure 2 for Towards A Unified Neural Architecture for Visual Recognition and Reasoning

Figure 3 for Towards A Unified Neural Architecture for Visual Recognition and Reasoning

Figure 4 for Towards A Unified Neural Architecture for Visual Recognition and Reasoning

Share this with someone who'll enjoy it:

Abstract:Recognition and reasoning are two pillars of visual understanding. However, these tasks have an imbalance in focus; whereas recent advances in neural networks have shown strong empirical performance in visual recognition, there has been comparably much less success in solving visual reasoning. Intuitively, unifying these two tasks under a singular framework is desirable, as they are mutually dependent and beneficial. Motivated by the recent success of multi-task transformers for visual recognition and language understanding, we propose a unified neural architecture for visual recognition and reasoning with a generic interface (e.g., tokens) for both. Our framework enables the principled investigation of how different visual recognition tasks, datasets, and inductive biases can help enable spatiotemporal reasoning capabilities. Noticeably, we find that object detection, which requires spatial localization of individual objects, is the most beneficial recognition task for reasoning. We further demonstrate via probing that implicit object-centric representations emerge automatically inside our framework. Intriguingly, we discover that certain architectural choices such as the backbone model of the visual encoder have a significant impact on visual reasoning, but little on object detection. Given the results of our experiments, we believe that visual reasoning should be considered as a first-class citizen alongside visual recognition, as they are strongly correlated but benefit from potentially different design choices.

View paper on

OpenReview

Share this with someone who'll enjoy it:

Title:Towards A Unified Neural Architecture for Visual Recognition and Reasoning

Paper and Code