Picture for Chengzu Li

Chengzu Li

How Well Do Models Follow Visual Instructions? VIBE: A Systematic Benchmark for Visual Instruction-Driven Image Editing

Add code
Feb 02, 2026
Viaarxiv icon

Thinking in Frames: How Visual Context and Test-Time Scaling Empower Video Reasoning

Add code
Jan 28, 2026
Viaarxiv icon

Confidence Estimation for LLMs in Multi-turn Interactions

Add code
Jan 05, 2026
Viaarxiv icon

11Plus-Bench: Demystifying Multimodal LLM Spatial Reasoning with Cognitive-Inspired Analysis

Add code
Aug 27, 2025
Viaarxiv icon

Reinforcement Learning for Better Verbalized Confidence in Long-Form Generation

Add code
May 29, 2025
Figure 1 for Reinforcement Learning for Better Verbalized Confidence in Long-Form Generation
Figure 2 for Reinforcement Learning for Better Verbalized Confidence in Long-Form Generation
Figure 3 for Reinforcement Learning for Better Verbalized Confidence in Long-Form Generation
Figure 4 for Reinforcement Learning for Better Verbalized Confidence in Long-Form Generation
Viaarxiv icon

Enriching Patent Claim Generation with European Patent Dataset

Add code
May 18, 2025
Viaarxiv icon

Visual Planning: Let's Think Only with Images

Add code
May 16, 2025
Viaarxiv icon

A Call for New Recipes to Enhance Spatial Reasoning in MLLMs

Add code
Apr 21, 2025
Viaarxiv icon

Imagine while Reasoning in Space: Multimodal Visualization-of-Thought

Add code
Jan 13, 2025
Figure 1 for Imagine while Reasoning in Space: Multimodal Visualization-of-Thought
Figure 2 for Imagine while Reasoning in Space: Multimodal Visualization-of-Thought
Figure 3 for Imagine while Reasoning in Space: Multimodal Visualization-of-Thought
Figure 4 for Imagine while Reasoning in Space: Multimodal Visualization-of-Thought
Viaarxiv icon

TopViewRS: Vision-Language Models as Top-View Spatial Reasoners

Add code
Jun 04, 2024
Figure 1 for TopViewRS: Vision-Language Models as Top-View Spatial Reasoners
Figure 2 for TopViewRS: Vision-Language Models as Top-View Spatial Reasoners
Figure 3 for TopViewRS: Vision-Language Models as Top-View Spatial Reasoners
Figure 4 for TopViewRS: Vision-Language Models as Top-View Spatial Reasoners
Viaarxiv icon