Picture for Yansong Tang

Yansong Tang

Embed-RL: Reinforcement Learning for Reasoning-Driven Multimodal Embeddings

Add code
Feb 14, 2026
Viaarxiv icon

ChatUMM: Robust Context Tracking for Conversational Interleaved Generation

Add code
Feb 06, 2026
Viaarxiv icon

CLAP: Contrastive Latent Action Pretraining for Learning Vision-Language-Action Models from Human Videos

Add code
Jan 07, 2026
Viaarxiv icon

DDAVS: Disentangled Audio Semantics and Delayed Bidirectional Alignment for Audio-Visual Segmentation

Add code
Dec 23, 2025
Viaarxiv icon

Memorize-and-Generate: Towards Long-Term Consistency in Real-Time Video Generation

Add code
Dec 23, 2025
Viaarxiv icon

AgentMath: Empowering Mathematical Reasoning for Large Language Models via Tool-Augmented Agent

Add code
Dec 23, 2025
Viaarxiv icon

Human-in-the-loop Online Rejection Sampling for Robotic Manipulation

Add code
Oct 30, 2025
Viaarxiv icon

VLA-Reasoner: Empowering Vision-Language-Action Models with Reasoning via Online Monte Carlo Tree Search

Add code
Sep 26, 2025
Viaarxiv icon

DSPv2: Improved Dense Policy for Effective and Generalizable Whole-body Mobile Manipulation

Add code
Sep 19, 2025
Viaarxiv icon

ScoreHOI: Physically Plausible Reconstruction of Human-Object Interaction via Score-Guided Diffusion

Add code
Sep 09, 2025
Viaarxiv icon