Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Visual Scratchpads: Enabling Global Reasoning in Vision

Oct 10, 2024

Aryo Lotfi, Enrico Fini, Samy Bengio, Moin Nabi, Emmanuel Abbe

Figure 1 for Visual Scratchpads: Enabling Global Reasoning in Vision

Figure 2 for Visual Scratchpads: Enabling Global Reasoning in Vision

Figure 3 for Visual Scratchpads: Enabling Global Reasoning in Vision

Figure 4 for Visual Scratchpads: Enabling Global Reasoning in Vision

Share this with someone who'll enjoy it:

Abstract:Modern vision models have achieved remarkable success in benchmarks where local features provide critical information about the target. There is now a growing interest in solving tasks that require more global reasoning, where local features offer no significant information. These tasks are reminiscent of the connectivity tasks discussed by Minsky and Papert in 1969, which exposed the limitations of the perceptron model and contributed to the first AI winter. In this paper, we revisit such tasks by introducing four global visual benchmarks involving path findings and mazes. We show that: (1) although today's large vision models largely surpass the expressivity limitations of the early models, they still struggle with the learning efficiency; we put forward the "globality degree" notion to understand this limitation; (2) we then demonstrate that the picture changes and global reasoning becomes feasible with the introduction of "visual scratchpads"; similarly to the text scratchpads and chain-of-thoughts used in language models, visual scratchpads help break down global tasks into simpler ones; (3) we finally show that some scratchpads are better than others, in particular, "inductive scratchpads" that take steps relying on less information afford better out-of-distribution generalization and succeed for smaller model sizes.

View paper on

Share this with someone who'll enjoy it:

Title:Visual Scratchpads: Enabling Global Reasoning in Vision

Paper and Code