Picture for Xinlong Wang

Xinlong Wang

DECO: Decoupled Multimodal Diffusion Transformer for Bimanual Dexterous Manipulation with a Plugin Tactile Adapter

Add code
Feb 05, 2026
Viaarxiv icon

EgoActor: Grounding Task Planning into Spatial-aware Egocentric Actions for Humanoid Robots via Visual-Language Models

Add code
Feb 04, 2026
Viaarxiv icon

LINA: Linear Autoregressive Image Generative Models with Continuous Tokens

Add code
Jan 30, 2026
Viaarxiv icon

Audio-sync Video Instance Editing with Granularity-Aware Mask Refiner

Add code
Dec 11, 2025
Figure 1 for Audio-sync Video Instance Editing with Granularity-Aware Mask Refiner
Figure 2 for Audio-sync Video Instance Editing with Granularity-Aware Mask Refiner
Figure 3 for Audio-sync Video Instance Editing with Granularity-Aware Mask Refiner
Figure 4 for Audio-sync Video Instance Editing with Granularity-Aware Mask Refiner
Viaarxiv icon

Thor: Towards Human-Level Whole-Body Reactions for Intense Contact-Rich Environments

Add code
Oct 30, 2025
Viaarxiv icon

Emu3.5: Native Multimodal Models are World Learners

Add code
Oct 30, 2025
Viaarxiv icon

BrainMCLIP: Brain Image Decoding with Multi-Layer feature Fusion of CLIP

Add code
Oct 22, 2025
Viaarxiv icon

CI-VID: A Coherent Interleaved Text-Video Dataset

Add code
Jul 02, 2025
Viaarxiv icon

Unified Vision-Language-Action Model

Add code
Jun 24, 2025
Figure 1 for Unified Vision-Language-Action Model
Figure 2 for Unified Vision-Language-Action Model
Figure 3 for Unified Vision-Language-Action Model
Figure 4 for Unified Vision-Language-Action Model
Viaarxiv icon

OmniGen2: Exploration to Advanced Multimodal Generation

Add code
Jun 23, 2025
Viaarxiv icon