Picture for Xinggang Wang

Xinggang Wang

MaTVLM: Hybrid Mamba-Transformer for Efficient Vision-Language Modeling

Add code
Mar 17, 2025
Viaarxiv icon

GroundingSuite: Measuring Complex Multi-Granular Pixel Grounding

Add code
Mar 13, 2025
Viaarxiv icon

Towards Fast, Memory-based and Data-Efficient Vision-Language Policy

Add code
Mar 13, 2025
Viaarxiv icon

OmniMamba: Efficient and Unified Multimodal Understanding and Generation via State Space Models

Add code
Mar 11, 2025
Viaarxiv icon

AlphaDrive: Unleashing the Power of VLMs in Autonomous Driving via Reinforcement Learning and Reasoning

Add code
Mar 10, 2025
Viaarxiv icon

Multimodal Mamba: Decoder-only Multimodal State Space Model via Quadratic to Linear Distillation

Add code
Feb 18, 2025
Viaarxiv icon

RAD: Training an End-to-End Driving Policy via Large-Scale 3DGS-based Reinforcement Learning

Add code
Feb 18, 2025
Viaarxiv icon

SceneVTG++: Controllable Multilingual Visual Text Generation in the Wild

Add code
Jan 07, 2025
Figure 1 for SceneVTG++: Controllable Multilingual Visual Text Generation in the Wild
Figure 2 for SceneVTG++: Controllable Multilingual Visual Text Generation in the Wild
Figure 3 for SceneVTG++: Controllable Multilingual Visual Text Generation in the Wild
Figure 4 for SceneVTG++: Controllable Multilingual Visual Text Generation in the Wild
Viaarxiv icon

Reconstruction vs. Generation: Taming Optimization Dilemma in Latent Diffusion Models

Add code
Jan 06, 2025
Figure 1 for Reconstruction vs. Generation: Taming Optimization Dilemma in Latent Diffusion Models
Figure 2 for Reconstruction vs. Generation: Taming Optimization Dilemma in Latent Diffusion Models
Figure 3 for Reconstruction vs. Generation: Taming Optimization Dilemma in Latent Diffusion Models
Figure 4 for Reconstruction vs. Generation: Taming Optimization Dilemma in Latent Diffusion Models
Viaarxiv icon

GaraMoSt: Parallel Multi-Granularity Motion and Structural Modeling for Efficient Multi-Frame Interpolation in DSA Images

Add code
Dec 19, 2024
Figure 1 for GaraMoSt: Parallel Multi-Granularity Motion and Structural Modeling for Efficient Multi-Frame Interpolation in DSA Images
Figure 2 for GaraMoSt: Parallel Multi-Granularity Motion and Structural Modeling for Efficient Multi-Frame Interpolation in DSA Images
Figure 3 for GaraMoSt: Parallel Multi-Granularity Motion and Structural Modeling for Efficient Multi-Frame Interpolation in DSA Images
Figure 4 for GaraMoSt: Parallel Multi-Granularity Motion and Structural Modeling for Efficient Multi-Frame Interpolation in DSA Images
Viaarxiv icon