Picture for Lin Yan

Lin Yan

Truncated Proximal Policy Optimization

Add code
Jun 18, 2025
Viaarxiv icon

PAG: Multi-Turn Reinforced LLM Self-Correction with Policy as Generative Verifier

Add code
Jun 12, 2025
Viaarxiv icon

Seed1.5-VL Technical Report

Add code
May 11, 2025
Viaarxiv icon

VAPO: Efficient and Reliable Reinforcement Learning for Advanced Reasoning Tasks

Add code
Apr 08, 2025
Viaarxiv icon

A Unified Pairwise Framework for RLHF: Bridging Generative Reward Modeling and Policy Optimization

Add code
Apr 07, 2025
Viaarxiv icon

Exploring Data Scaling Trends and Effects in Reinforcement Learning from Human Feedback

Add code
Mar 31, 2025
Viaarxiv icon

DAPO: An Open-Source LLM Reinforcement Learning System at Scale

Add code
Mar 18, 2025
Viaarxiv icon

What's Behind PPO's Collapse in Long-CoT? Value Optimization Holds the Secret

Add code
Mar 03, 2025
Viaarxiv icon

Flaming-hot Initiation with Regular Execution Sampling for Large Language Models

Add code
Oct 28, 2024
Figure 1 for Flaming-hot Initiation with Regular Execution Sampling for Large Language Models
Figure 2 for Flaming-hot Initiation with Regular Execution Sampling for Large Language Models
Figure 3 for Flaming-hot Initiation with Regular Execution Sampling for Large Language Models
Figure 4 for Flaming-hot Initiation with Regular Execution Sampling for Large Language Models
Viaarxiv icon

Process Supervision-Guided Policy Optimization for Code Generation

Add code
Oct 23, 2024
Figure 1 for Process Supervision-Guided Policy Optimization for Code Generation
Figure 2 for Process Supervision-Guided Policy Optimization for Code Generation
Figure 3 for Process Supervision-Guided Policy Optimization for Code Generation
Figure 4 for Process Supervision-Guided Policy Optimization for Code Generation
Viaarxiv icon