Picture for Yue Fan

Yue Fan

Reasoning Within the Mind: Dynamic Multimodal Interleaving in Latent Space

Add code
Dec 17, 2025
Viaarxiv icon

Self-Evolving 3D Scene Generation from a Single Image

Add code
Dec 09, 2025
Viaarxiv icon

"PhyWorldBench": A Comprehensive Evaluation of Physical Realism in Text-to-Video Models

Add code
Jul 17, 2025
Viaarxiv icon

LEO-VL: Towards 3D Vision-Language Generalists via Data Scaling with Efficient Representation

Add code
Jun 11, 2025
Figure 1 for LEO-VL: Towards 3D Vision-Language Generalists via Data Scaling with Efficient Representation
Figure 2 for LEO-VL: Towards 3D Vision-Language Generalists via Data Scaling with Efficient Representation
Figure 3 for LEO-VL: Towards 3D Vision-Language Generalists via Data Scaling with Efficient Representation
Figure 4 for LEO-VL: Towards 3D Vision-Language Generalists via Data Scaling with Efficient Representation
Viaarxiv icon

From Objects to Anywhere: A Holistic Benchmark for Multi-level Visual Grounding in 3D Scenes

Add code
Jun 05, 2025
Figure 1 for From Objects to Anywhere: A Holistic Benchmark for Multi-level Visual Grounding in 3D Scenes
Figure 2 for From Objects to Anywhere: A Holistic Benchmark for Multi-level Visual Grounding in 3D Scenes
Figure 3 for From Objects to Anywhere: A Holistic Benchmark for Multi-level Visual Grounding in 3D Scenes
Figure 4 for From Objects to Anywhere: A Holistic Benchmark for Multi-level Visual Grounding in 3D Scenes
Viaarxiv icon

GRIT: Teaching MLLMs to Think with Images

Add code
May 21, 2025
Viaarxiv icon

Rethinking Visual Layer Selection in Multimodal LLMs

Add code
Apr 30, 2025
Viaarxiv icon

Multi-Layer Visual Feature Fusion in Multimodal LLMs: Methods, Analysis, and Best Practices

Add code
Mar 08, 2025
Viaarxiv icon

Multimodal Inconsistency Reasoning (MMIR): A New Benchmark for Multimodal Reasoning Models

Add code
Feb 22, 2025
Viaarxiv icon

GUI-Bee: Align GUI Action Grounding to Novel Environments via Autonomous Exploration

Add code
Jan 27, 2025
Figure 1 for GUI-Bee: Align GUI Action Grounding to Novel Environments via Autonomous Exploration
Figure 2 for GUI-Bee: Align GUI Action Grounding to Novel Environments via Autonomous Exploration
Figure 3 for GUI-Bee: Align GUI Action Grounding to Novel Environments via Autonomous Exploration
Figure 4 for GUI-Bee: Align GUI Action Grounding to Novel Environments via Autonomous Exploration
Viaarxiv icon