Picture for Xiang Bai

Xiang Bai

Huazhong University of Science and Technology

Generative Compositor for Few-Shot Visual Information Extraction

Add code
Mar 21, 2025
Viaarxiv icon

Seeing the Future, Perceiving the Future: A Unified Driving World Model for Future Generation and Perception

Add code
Mar 17, 2025
Viaarxiv icon

ReCamMaster: Camera-Controlled Generative Rendering from A Single Video

Add code
Mar 14, 2025
Viaarxiv icon

MFRS: A Multi-Frequency Reference Series Approach to Scalable and Accurate Time-Series Forecasting

Add code
Mar 11, 2025
Viaarxiv icon

PathVG: A New Benchmark and Dataset for Pathology Visual Grounding

Add code
Feb 28, 2025
Figure 1 for PathVG: A New Benchmark and Dataset for Pathology Visual Grounding
Figure 2 for PathVG: A New Benchmark and Dataset for Pathology Visual Grounding
Figure 3 for PathVG: A New Benchmark and Dataset for Pathology Visual Grounding
Figure 4 for PathVG: A New Benchmark and Dataset for Pathology Visual Grounding
Viaarxiv icon

OmniParser V2: Structured-Points-of-Thought for Unified Visual Text Parsing and Its Generality to Multimodal Large Language Models

Add code
Feb 22, 2025
Viaarxiv icon

HERMES: A Unified Self-Driving World Model for Simultaneous 3D Scene Understanding and Generation

Add code
Jan 24, 2025
Figure 1 for HERMES: A Unified Self-Driving World Model for Simultaneous 3D Scene Understanding and Generation
Figure 2 for HERMES: A Unified Self-Driving World Model for Simultaneous 3D Scene Understanding and Generation
Figure 3 for HERMES: A Unified Self-Driving World Model for Simultaneous 3D Scene Understanding and Generation
Figure 4 for HERMES: A Unified Self-Driving World Model for Simultaneous 3D Scene Understanding and Generation
Viaarxiv icon

Training-free Ultra Small Model for Universal Sparse Reconstruction in Compressed Sensing

Add code
Jan 20, 2025
Viaarxiv icon

VideoAnydoor: High-fidelity Video Object Insertion with Precise Motion Control

Add code
Jan 07, 2025
Figure 1 for VideoAnydoor: High-fidelity Video Object Insertion with Precise Motion Control
Figure 2 for VideoAnydoor: High-fidelity Video Object Insertion with Precise Motion Control
Figure 3 for VideoAnydoor: High-fidelity Video Object Insertion with Precise Motion Control
Figure 4 for VideoAnydoor: High-fidelity Video Object Insertion with Precise Motion Control
Viaarxiv icon

OCRBench v2: An Improved Benchmark for Evaluating Large Multimodal Models on Visual Text Localization and Reasoning

Add code
Dec 31, 2024
Figure 1 for OCRBench v2: An Improved Benchmark for Evaluating Large Multimodal Models on Visual Text Localization and Reasoning
Figure 2 for OCRBench v2: An Improved Benchmark for Evaluating Large Multimodal Models on Visual Text Localization and Reasoning
Figure 3 for OCRBench v2: An Improved Benchmark for Evaluating Large Multimodal Models on Visual Text Localization and Reasoning
Figure 4 for OCRBench v2: An Improved Benchmark for Evaluating Large Multimodal Models on Visual Text Localization and Reasoning
Viaarxiv icon