Picture for Wenxuan Song

Wenxuan Song

S-VAM: Shortcut Video-Action Model by Self-Distilling Geometric and Semantic Foresight

Add code
Mar 17, 2026
Viaarxiv icon

PROSPECT: Unified Streaming Vision-Language Navigation via Semantic--Spatial Fusion and Latent Predictive Representation

Add code
Mar 04, 2026
Viaarxiv icon

Rethinking the Practicality of Vision-language-action Model: A Comprehensive Benchmark and An Improved Baseline

Add code
Feb 26, 2026
Viaarxiv icon

FRAPPE: Infusing World Modeling into Generalist Policies via Multiple Future Representation Alignment

Add code
Feb 19, 2026
Viaarxiv icon

Designing KRIYA: An AI Companion for Wellbeing Self-Reflection

Add code
Jan 21, 2026
Viaarxiv icon

Embodied Robot Manipulation in the Era of Foundation Models: Planning and Learning Perspectives

Add code
Dec 28, 2025
Viaarxiv icon

HiF-VLA: Hindsight, Insight and Foresight through Motion Representation for Vision-Language-Action Models

Add code
Dec 10, 2025
Figure 1 for HiF-VLA: Hindsight, Insight and Foresight through Motion Representation for Vision-Language-Action Models
Figure 2 for HiF-VLA: Hindsight, Insight and Foresight through Motion Representation for Vision-Language-Action Models
Figure 3 for HiF-VLA: Hindsight, Insight and Foresight through Motion Representation for Vision-Language-Action Models
Figure 4 for HiF-VLA: Hindsight, Insight and Foresight through Motion Representation for Vision-Language-Action Models
Viaarxiv icon

VLA-Adapter: An Effective Paradigm for Tiny-Scale Vision-Language-Action Model

Add code
Sep 11, 2025
Viaarxiv icon

FlowVLA: Thinking in Motion with a Visual Chain of Thought

Add code
Aug 25, 2025
Viaarxiv icon

ReconVLA: Reconstructive Vision-Language-Action Model as Effective Robot Perceiver

Add code
Aug 14, 2025
Figure 1 for ReconVLA: Reconstructive Vision-Language-Action Model as Effective Robot Perceiver
Figure 2 for ReconVLA: Reconstructive Vision-Language-Action Model as Effective Robot Perceiver
Figure 3 for ReconVLA: Reconstructive Vision-Language-Action Model as Effective Robot Perceiver
Figure 4 for ReconVLA: Reconstructive Vision-Language-Action Model as Effective Robot Perceiver
Viaarxiv icon