Picture for Wen Sun

Wen Sun

Efficient Imitation Under Misspecification

Add code
Mar 17, 2025
Viaarxiv icon

Step-Video-TI2V Technical Report: A State-of-the-Art Text-Driven Image-to-Video Generation Model

Add code
Mar 14, 2025
Viaarxiv icon

All Roads Lead to Likelihood: The Value of Reinforcement Learning in Fine-Tuning

Add code
Mar 03, 2025
Viaarxiv icon

$Q\sharp$: Provably Optimal Distributional RL for LLM Post-Training

Add code
Feb 27, 2025
Viaarxiv icon

Step-Audio: Unified Understanding and Generation in Intelligent Speech Interaction

Add code
Feb 18, 2025
Viaarxiv icon

Diffusing States and Matching Scores: A New Framework for Imitation Learning

Add code
Oct 17, 2024
Viaarxiv icon

Regressing the Relative Future: Efficient Policy Optimization for Multi-turn RLHF

Add code
Oct 06, 2024
Figure 1 for Regressing the Relative Future: Efficient Policy Optimization for Multi-turn RLHF
Figure 2 for Regressing the Relative Future: Efficient Policy Optimization for Multi-turn RLHF
Figure 3 for Regressing the Relative Future: Efficient Policy Optimization for Multi-turn RLHF
Figure 4 for Regressing the Relative Future: Efficient Policy Optimization for Multi-turn RLHF
Viaarxiv icon

Two-Timescale Synchronization and Migration for Digital Twin Networks: A Multi-Agent Deep Reinforcement Learning Approach

Add code
Sep 02, 2024
Figure 1 for Two-Timescale Synchronization and Migration for Digital Twin Networks: A Multi-Agent Deep Reinforcement Learning Approach
Figure 2 for Two-Timescale Synchronization and Migration for Digital Twin Networks: A Multi-Agent Deep Reinforcement Learning Approach
Figure 3 for Two-Timescale Synchronization and Migration for Digital Twin Networks: A Multi-Agent Deep Reinforcement Learning Approach
Figure 4 for Two-Timescale Synchronization and Migration for Digital Twin Networks: A Multi-Agent Deep Reinforcement Learning Approach
Viaarxiv icon

Model-based RL as a Minimalist Approach to Horizon-Free and Second-Order Bounds

Add code
Aug 16, 2024
Viaarxiv icon

Correcting the Mythos of KL-Regularization: Direct Alignment without Overparameterization via Chi-squared Preference Optimization

Add code
Jul 18, 2024
Figure 1 for Correcting the Mythos of KL-Regularization: Direct Alignment without Overparameterization via Chi-squared Preference Optimization
Figure 2 for Correcting the Mythos of KL-Regularization: Direct Alignment without Overparameterization via Chi-squared Preference Optimization
Figure 3 for Correcting the Mythos of KL-Regularization: Direct Alignment without Overparameterization via Chi-squared Preference Optimization
Viaarxiv icon