Picture for Hang Xu

Hang Xu

PaMi-VDPO: Mitigating Video Hallucinations by Prompt-Aware Multi-Instance Video Preference Learning

Add code
Apr 08, 2025
Viaarxiv icon

ILLUME+: Illuminating Unified MLLM with Dual Visual Tokenization and Diffusion Refinement

Add code
Apr 03, 2025
Viaarxiv icon

From Flatland to Space: Teaching Vision-Language Models to Perceive and Reason in 3D

Add code
Mar 29, 2025
Viaarxiv icon

DynamiCtrl: Rethinking the Basic Structure and the Role of Text for High-quality Human Image Animation

Add code
Mar 27, 2025
Viaarxiv icon

EDEN: Enhanced Diffusion for High-quality Large-motion Video Frame Interpolation

Add code
Mar 20, 2025
Viaarxiv icon

ZO2: Scalable Zeroth-Order Fine-Tuning for Extremely Large Language Models with Limited GPU Memory

Add code
Mar 16, 2025
Viaarxiv icon

Open-Sora 2.0: Training a Commercial-Level Video Generation Model in $200k

Add code
Mar 12, 2025
Viaarxiv icon

SemHiTok: A Unified Image Tokenizer via Semantic-Guided Hierarchical Codebook for Multimodal Understanding and Generation

Add code
Mar 09, 2025
Viaarxiv icon

Can Atomic Step Decomposition Enhance the Self-structured Reasoning of Multimodal Large Models?

Add code
Mar 08, 2025
Viaarxiv icon

Towards Heisenberg limit without critical slowing down via quantum reinforcement learning

Add code
Mar 04, 2025
Viaarxiv icon