Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Steven M. Frankland

Understanding the Limits of Vision Language Models Through the Lens of the Binding Problem

Oct 31, 2024

Declan Campbell, Sunayana Rane, Tyler Giallanza, Nicolò De Sabbata, Kia Ghods, Amogh Joshi, Alexander Ku, Steven M. Frankland, Thomas L. Griffiths, Jonathan D. Cohen(+1 more)

Figure 1 for Understanding the Limits of Vision Language Models Through the Lens of the Binding Problem

Figure 2 for Understanding the Limits of Vision Language Models Through the Lens of the Binding Problem

Figure 3 for Understanding the Limits of Vision Language Models Through the Lens of the Binding Problem

Figure 4 for Understanding the Limits of Vision Language Models Through the Lens of the Binding Problem

Abstract:Recent work has documented striking heterogeneity in the performance of state-of-the-art vision language models (VLMs), including both multimodal language models and text-to-image models. These models are able to describe and generate a diverse array of complex, naturalistic images, yet they exhibit surprising failures on basic multi-object reasoning tasks -- such as counting, localization, and simple forms of visual analogy -- that humans perform with near perfect accuracy. To better understand this puzzling pattern of successes and failures, we turn to theoretical accounts of the binding problem in cognitive science and neuroscience, a fundamental problem that arises when a shared set of representational resources must be used to represent distinct entities (e.g., to represent multiple objects in an image), necessitating the use of serial processing to avoid interference. We find that many of the puzzling failures of state-of-the-art VLMs can be explained as arising due to the binding problem, and that these failure modes are strikingly similar to the limitations exhibited by rapid, feedforward processing in the human brain.

Via

Access Paper or Ask Questions

The Relational Bottleneck as an Inductive Bias for Efficient Abstraction

Sep 12, 2023

Taylor W. Webb, Steven M. Frankland, Awni Altabaa, Kamesh Krishnamurthy, Declan Campbell, Jacob Russin, Randall O'Reilly, John Lafferty, Jonathan D. Cohen

Figure 1 for The Relational Bottleneck as an Inductive Bias for Efficient Abstraction

Figure 2 for The Relational Bottleneck as an Inductive Bias for Efficient Abstraction

Abstract:A central challenge for cognitive science is to explain how abstract concepts are acquired from limited experience. This effort has often been framed in terms of a dichotomy between empiricist and nativist approaches, most recently embodied by debates concerning deep neural networks and symbolic cognitive models. Here, we highlight a recently emerging line of work that suggests a novel reconciliation of these approaches, by exploiting an inductive bias that we term the relational bottleneck. We review a family of models that employ this approach to induce abstractions in a data-efficient manner, emphasizing their potential as candidate models for the acquisition of abstract concepts in the human mind and brain.

Via

Access Paper or Ask Questions

Learning Representations that Support Extrapolation

Jul 09, 2020

Taylor W. Webb, Zachary Dulberg, Steven M. Frankland, Alexander A. Petrov, Randall C. O'Reilly, Jonathan D. Cohen

Figure 1 for Learning Representations that Support Extrapolation

Figure 2 for Learning Representations that Support Extrapolation

Figure 3 for Learning Representations that Support Extrapolation

Figure 4 for Learning Representations that Support Extrapolation

Abstract:Extrapolation -- the ability to make inferences that go beyond the scope of one's experiences -- is a hallmark of human intelligence. By contrast, the generalization exhibited by contemporary neural network algorithms is largely limited to interpolation between data points in their training corpora. In this paper, we consider the challenge of learning representations that support extrapolation. We introduce a novel visual analogy benchmark that allows the graded evaluation of extrapolation as a function of distance from the convex domain defined by the training data. We also introduce a simple technique, context normalization, that encourages representations that emphasize the relations between objects. We find that this technique enables a significant improvement in the ability to extrapolate, considerably outperforming a number of competitive techniques.

* ICML 2020

Via

Access Paper or Ask Questions