Picture for Yuda Song

Yuda Song

Outcome-based Exploration for LLM Reasoning

Add code
Sep 08, 2025
Viaarxiv icon

Mind the Gap: Examining the Self-Improvement Capabilities of Large Language Models

Add code
Dec 03, 2024
Viaarxiv icon

Hybrid Reinforcement Learning from Offline Observation Alone

Add code
Jun 11, 2024
Figure 1 for Hybrid Reinforcement Learning from Offline Observation Alone
Figure 2 for Hybrid Reinforcement Learning from Offline Observation Alone
Figure 3 for Hybrid Reinforcement Learning from Offline Observation Alone
Figure 4 for Hybrid Reinforcement Learning from Offline Observation Alone
Viaarxiv icon

Understanding Preference Fine-Tuning Through the Lens of Coverage

Add code
Jun 03, 2024
Figure 1 for Understanding Preference Fine-Tuning Through the Lens of Coverage
Figure 2 for Understanding Preference Fine-Tuning Through the Lens of Coverage
Figure 3 for Understanding Preference Fine-Tuning Through the Lens of Coverage
Figure 4 for Understanding Preference Fine-Tuning Through the Lens of Coverage
Viaarxiv icon

Rich-Observation Reinforcement Learning with Continuous Latent Dynamics

Add code
May 29, 2024
Figure 1 for Rich-Observation Reinforcement Learning with Continuous Latent Dynamics
Figure 2 for Rich-Observation Reinforcement Learning with Continuous Latent Dynamics
Figure 3 for Rich-Observation Reinforcement Learning with Continuous Latent Dynamics
Figure 4 for Rich-Observation Reinforcement Learning with Continuous Latent Dynamics
Viaarxiv icon

SDXS: Real-Time One-Step Latent Diffusion Models with Image Conditions

Add code
Mar 25, 2024
Viaarxiv icon

Offline Data Enhanced On-Policy Policy Gradient with Provable Guarantees

Add code
Nov 14, 2023
Figure 1 for Offline Data Enhanced On-Policy Policy Gradient with Provable Guarantees
Figure 2 for Offline Data Enhanced On-Policy Policy Gradient with Provable Guarantees
Figure 3 for Offline Data Enhanced On-Policy Policy Gradient with Provable Guarantees
Figure 4 for Offline Data Enhanced On-Policy Policy Gradient with Provable Guarantees
Viaarxiv icon

The Virtues of Laziness in Model-based RL: A Unified Objective and Algorithms

Add code
Mar 01, 2023
Viaarxiv icon

ClassPruning: Speed Up Image Restoration Networks by Dynamic N:M Pruning

Add code
Nov 10, 2022
Viaarxiv icon

Representation Learning for General-sum Low-rank Markov Games

Add code
Oct 30, 2022
Figure 1 for Representation Learning for General-sum Low-rank Markov Games
Figure 2 for Representation Learning for General-sum Low-rank Markov Games
Figure 3 for Representation Learning for General-sum Low-rank Markov Games
Figure 4 for Representation Learning for General-sum Low-rank Markov Games
Viaarxiv icon