Picture for Yuxuan Tong

Yuxuan Tong

The Optimal Token Baseline: Variance Reduction for Long-Horizon LLM-RL

Add code
Feb 06, 2026
Viaarxiv icon

Trust Region Masking for Long-Horizon LLM Reinforcement Learning

Add code
Dec 28, 2025
Viaarxiv icon

Taming the Tail: Stable LLM Reinforcement Learning via Dynamic Vocabulary Pruning

Add code
Dec 28, 2025
Viaarxiv icon

Laminar: A Scalable Asynchronous RL Post-Training Framework

Add code
Oct 14, 2025
Viaarxiv icon

DAPO: An Open-Source LLM Reinforcement Learning System at Scale

Add code
Mar 18, 2025
Figure 1 for DAPO: An Open-Source LLM Reinforcement Learning System at Scale
Figure 2 for DAPO: An Open-Source LLM Reinforcement Learning System at Scale
Figure 3 for DAPO: An Open-Source LLM Reinforcement Learning System at Scale
Figure 4 for DAPO: An Open-Source LLM Reinforcement Learning System at Scale
Viaarxiv icon

Demystifying Long Chain-of-Thought Reasoning in LLMs

Add code
Feb 05, 2025
Figure 1 for Demystifying Long Chain-of-Thought Reasoning in LLMs
Figure 2 for Demystifying Long Chain-of-Thought Reasoning in LLMs
Figure 3 for Demystifying Long Chain-of-Thought Reasoning in LLMs
Figure 4 for Demystifying Long Chain-of-Thought Reasoning in LLMs
Viaarxiv icon

ImageReward: Learning and Evaluating Human Preferences for Text-to-Image Generation

Add code
Apr 13, 2023
Figure 1 for ImageReward: Learning and Evaluating Human Preferences for Text-to-Image Generation
Figure 2 for ImageReward: Learning and Evaluating Human Preferences for Text-to-Image Generation
Figure 3 for ImageReward: Learning and Evaluating Human Preferences for Text-to-Image Generation
Figure 4 for ImageReward: Learning and Evaluating Human Preferences for Text-to-Image Generation
Viaarxiv icon