Picture for Tong He

Tong He

DreamComposer++: Empowering Diffusion Models with Multi-View Conditions for 3D Content Generation

Add code
Jul 03, 2025
Viaarxiv icon

VQ-VLA: Improving Vision-Language-Action Models via Scaling Vector-Quantized Action Tokenizers

Add code
Jul 01, 2025
Viaarxiv icon

Sekai: A Video Dataset towards World Exploration

Add code
Jun 18, 2025
Viaarxiv icon

Explicit Preference Optimization: No Need for an Implicit Reward Model

Add code
Jun 09, 2025
Viaarxiv icon

Sparse Autoencoders, Again?

Add code
Jun 06, 2025
Viaarxiv icon

S4-Driver: Scalable Self-Supervised Driving Multimodal Large Language Modelwith Spatio-Temporal Visual Representation

Add code
May 30, 2025
Viaarxiv icon

CoMo: Learning Continuous Latent Motion from Internet Videos for Scalable Robot Learning

Add code
May 22, 2025
Viaarxiv icon

Aether: Geometric-Aware Unified World Modeling

Add code
Mar 25, 2025
Viaarxiv icon

GoalFlow: Goal-Driven Flow Matching for Multimodal Trajectories Generation in End-to-End Autonomous Driving

Add code
Mar 07, 2025
Viaarxiv icon

DICEPTION: A Generalist Diffusion Model for Visual Perceptual Tasks

Add code
Feb 25, 2025
Viaarxiv icon