Picture for Ruochen Zhou

Ruochen Zhou

Why Is Spatial Reasoning Hard for VLMs? An Attention Mechanism Perspective on Focus Areas

Add code
Mar 04, 2025
Viaarxiv icon

Image-of-Thought Prompting for Visual Reasoning Refinement in Multimodal Large Language Models

Add code
May 22, 2024
Viaarxiv icon