Picture for Delin Qu

Delin Qu

Openpi Comet: Competition Solution For 2025 BEHAVIOR Challenge

Add code
Dec 12, 2025
Viaarxiv icon

Are We Ready for RL in Text-to-3D Generation? A Progressive Investigation

Add code
Dec 11, 2025
Viaarxiv icon

F1: A Vision-Language-Action Model Bridging Understanding and Generation to Actions

Add code
Sep 09, 2025
Figure 1 for F1: A Vision-Language-Action Model Bridging Understanding and Generation to Actions
Figure 2 for F1: A Vision-Language-Action Model Bridging Understanding and Generation to Actions
Figure 3 for F1: A Vision-Language-Action Model Bridging Understanding and Generation to Actions
Figure 4 for F1: A Vision-Language-Action Model Bridging Understanding and Generation to Actions
Viaarxiv icon

EmbodiedOneVision: Interleaved Vision-Text-Action Pretraining for General Robot Control

Add code
Aug 28, 2025
Viaarxiv icon

Hume: Introducing System-2 Thinking in Visual-Language-Action Model

Add code
May 29, 2025
Figure 1 for Hume: Introducing System-2 Thinking in Visual-Language-Action Model
Figure 2 for Hume: Introducing System-2 Thinking in Visual-Language-Action Model
Figure 3 for Hume: Introducing System-2 Thinking in Visual-Language-Action Model
Figure 4 for Hume: Introducing System-2 Thinking in Visual-Language-Action Model
Viaarxiv icon

Revisiting Multi-Agent World Modeling from a Diffusion-Inspired Perspective

Add code
May 27, 2025
Figure 1 for Revisiting Multi-Agent World Modeling from a Diffusion-Inspired Perspective
Figure 2 for Revisiting Multi-Agent World Modeling from a Diffusion-Inspired Perspective
Figure 3 for Revisiting Multi-Agent World Modeling from a Diffusion-Inspired Perspective
Figure 4 for Revisiting Multi-Agent World Modeling from a Diffusion-Inspired Perspective
Viaarxiv icon

Think Small, Act Big: Primitive Prompt Learning for Lifelong Robot Manipulation

Add code
Apr 01, 2025
Viaarxiv icon

Uni$\textbf{F}^2$ace: Fine-grained Face Understanding and Generation with Unified Multimodal Models

Add code
Mar 11, 2025
Figure 1 for Uni$\textbf{F}^2$ace: Fine-grained Face Understanding and Generation with Unified Multimodal Models
Figure 2 for Uni$\textbf{F}^2$ace: Fine-grained Face Understanding and Generation with Unified Multimodal Models
Figure 3 for Uni$\textbf{F}^2$ace: Fine-grained Face Understanding and Generation with Unified Multimodal Models
Figure 4 for Uni$\textbf{F}^2$ace: Fine-grained Face Understanding and Generation with Unified Multimodal Models
Viaarxiv icon

Exploring the Potential of Encoder-free Architectures in 3D LMMs

Add code
Feb 13, 2025
Figure 1 for Exploring the Potential of Encoder-free Architectures in 3D LMMs
Figure 2 for Exploring the Potential of Encoder-free Architectures in 3D LMMs
Figure 3 for Exploring the Potential of Encoder-free Architectures in 3D LMMs
Figure 4 for Exploring the Potential of Encoder-free Architectures in 3D LMMs
Viaarxiv icon

SpatialVLA: Exploring Spatial Representations for Visual-Language-Action Model

Add code
Jan 27, 2025
Figure 1 for SpatialVLA: Exploring Spatial Representations for Visual-Language-Action Model
Figure 2 for SpatialVLA: Exploring Spatial Representations for Visual-Language-Action Model
Figure 3 for SpatialVLA: Exploring Spatial Representations for Visual-Language-Action Model
Figure 4 for SpatialVLA: Exploring Spatial Representations for Visual-Language-Action Model
Viaarxiv icon