Picture for Tieniu Tan

Tieniu Tan

Reinforcing Spatial Reasoning in Vision-Language Models with Interwoven Thinking and Visual Drawing

Add code
Jun 11, 2025
Viaarxiv icon

VersaVid-R1: A Versatile Video Understanding and Reasoning Model from Question Answering to Captioning Tasks

Add code
Jun 10, 2025
Viaarxiv icon

BridgeVLA: Input-Output Alignment for Efficient 3D Manipulation Learning with Vision-Language Models

Add code
Jun 09, 2025
Viaarxiv icon

REACT: Representation Extraction And Controllable Tuning to Overcome Overfitting in LLM Knowledge Editing

Add code
May 25, 2025
Viaarxiv icon

Rethinking the Role of Prompting Strategies in LLM Test-Time Scaling: A Perspective of Probability Theory

Add code
May 16, 2025
Viaarxiv icon

A Call for New Recipes to Enhance Spatial Reasoning in MLLMs

Add code
Apr 21, 2025
Viaarxiv icon

MME-Unify: A Comprehensive Benchmark for Unified Multimodal Understanding and Generation Models

Add code
Apr 07, 2025
Viaarxiv icon

Aligning Multimodal LLM with Human Preference: A Survey

Add code
Mar 18, 2025
Viaarxiv icon

VidCapBench: A Comprehensive Benchmark of Video Captioning for Controllable Text-to-Video Generation

Add code
Feb 18, 2025
Viaarxiv icon

Towards Compatible Fine-tuning for Vision-Language Model Updates

Add code
Dec 30, 2024
Viaarxiv icon