Picture for Xiang Bai

Xiang Bai

Huazhong University of Science and Technology

DINO Eats CLIP: Adapting Beyond Knowns for Open-set 3D Object Retrieval

Add code
Apr 21, 2026
Viaarxiv icon

Doc-V*:Coarse-to-Fine Interactive Visual Reasoning for Multi-Page Document VQA

Add code
Apr 15, 2026
Viaarxiv icon

DocSeeker: Structured Visual Reasoning with Evidence Grounding for Long Document Understanding

Add code
Apr 14, 2026
Viaarxiv icon

When Numbers Speak: Aligning Textual Numerals and Visual Instances in Text-to-Video Diffusion Models

Add code
Apr 09, 2026
Viaarxiv icon

PointTPA: Dynamic Network Parameter Adaptation for 3D Scene Understanding

Add code
Apr 06, 2026
Viaarxiv icon

MDPBench: A Benchmark for Multilingual Document Parsing in Real-World Scenarios

Add code
Mar 30, 2026
Viaarxiv icon

Out of Sight but Not Out of Mind: Hybrid Memory for Dynamic Video World Models

Add code
Mar 26, 2026
Viaarxiv icon

Generation Models Know Space: Unleashing Implicit 3D Priors for Scene Understanding

Add code
Mar 19, 2026
Viaarxiv icon

Towards Generalizable Robotic Manipulation in Dynamic Environments

Add code
Mar 16, 2026
Viaarxiv icon

Multimodal OCR: Parse Anything from Documents

Add code
Mar 13, 2026
Viaarxiv icon