Picture for Haodong Duan

Haodong Duan

MM-IFEngine: Towards Multimodal Instruction Following

Add code
Apr 10, 2025
Viaarxiv icon

Envisioning Beyond the Pixels: Benchmarking Reasoning-Informed Visual Editing

Add code
Apr 03, 2025
Viaarxiv icon

LEGO-Puzzles: How Good Are MLLMs at Multi-Step Spatial Reasoning?

Add code
Mar 25, 2025
Viaarxiv icon

Creation-MMBench: Assessing Context-Aware Creative Intelligence in MLLM

Add code
Mar 19, 2025
Viaarxiv icon

VisualPRM: An Effective Process Reward Model for Multimodal Reasoning

Add code
Mar 13, 2025
Viaarxiv icon

Information Density Principle for MLLM Benchmarks

Add code
Mar 13, 2025
Viaarxiv icon

Image Quality Assessment: From Human to Machine Preference

Add code
Mar 13, 2025
Viaarxiv icon

Visual-RFT: Visual Reinforcement Fine-Tuning

Add code
Mar 03, 2025
Viaarxiv icon

OmniAlign-V: Towards Enhanced Alignment of MLLMs with Human Preference

Add code
Feb 25, 2025
Viaarxiv icon

VideoRoPE: What Makes for Good Video Rotary Position Embedding?

Add code
Feb 07, 2025
Viaarxiv icon