Picture for Yuda Song

Yuda Song

Expanding the Capabilities of Reinforcement Learning via Text Feedback

Add code
Feb 02, 2026
Viaarxiv icon

Maximum Likelihood Reinforcement Learning

Add code
Feb 02, 2026
Viaarxiv icon

Towards Scalable Pre-training of Visual Tokenizers for Generation

Add code
Dec 15, 2025
Viaarxiv icon

Outcome-based Exploration for LLM Reasoning

Add code
Sep 08, 2025
Viaarxiv icon

Mind the Gap: Examining the Self-Improvement Capabilities of Large Language Models

Add code
Dec 03, 2024
Viaarxiv icon

Hybrid Reinforcement Learning from Offline Observation Alone

Add code
Jun 11, 2024
Figure 1 for Hybrid Reinforcement Learning from Offline Observation Alone
Figure 2 for Hybrid Reinforcement Learning from Offline Observation Alone
Figure 3 for Hybrid Reinforcement Learning from Offline Observation Alone
Figure 4 for Hybrid Reinforcement Learning from Offline Observation Alone
Viaarxiv icon

Understanding Preference Fine-Tuning Through the Lens of Coverage

Add code
Jun 03, 2024
Figure 1 for Understanding Preference Fine-Tuning Through the Lens of Coverage
Figure 2 for Understanding Preference Fine-Tuning Through the Lens of Coverage
Figure 3 for Understanding Preference Fine-Tuning Through the Lens of Coverage
Figure 4 for Understanding Preference Fine-Tuning Through the Lens of Coverage
Viaarxiv icon

Rich-Observation Reinforcement Learning with Continuous Latent Dynamics

Add code
May 29, 2024
Figure 1 for Rich-Observation Reinforcement Learning with Continuous Latent Dynamics
Figure 2 for Rich-Observation Reinforcement Learning with Continuous Latent Dynamics
Figure 3 for Rich-Observation Reinforcement Learning with Continuous Latent Dynamics
Figure 4 for Rich-Observation Reinforcement Learning with Continuous Latent Dynamics
Viaarxiv icon

SDXS: Real-Time One-Step Latent Diffusion Models with Image Conditions

Add code
Mar 25, 2024
Viaarxiv icon

Offline Data Enhanced On-Policy Policy Gradient with Provable Guarantees

Add code
Nov 14, 2023
Figure 1 for Offline Data Enhanced On-Policy Policy Gradient with Provable Guarantees
Figure 2 for Offline Data Enhanced On-Policy Policy Gradient with Provable Guarantees
Figure 3 for Offline Data Enhanced On-Policy Policy Gradient with Provable Guarantees
Figure 4 for Offline Data Enhanced On-Policy Policy Gradient with Provable Guarantees
Viaarxiv icon