Picture for Chuyi He

Chuyi He

From Self-Evolving Synthetic Data to Verifiable-Reward RL: Post-Training Multi-turn Interactive Tool-Using Agents

Add code
Jan 30, 2026
Viaarxiv icon

Beyond Ten Turns: Unlocking Long-Horizon Agentic Search with Large-Scale Asynchronous RL

Add code
Aug 13, 2025
Viaarxiv icon

AReaL: A Large-Scale Asynchronous Reinforcement Learning System for Language Reasoning

Add code
May 30, 2025
Viaarxiv icon

On Designing Effective RL Reward at Training Time for LLM Reasoning

Add code
Oct 19, 2024
Figure 1 for On Designing Effective RL Reward at Training Time for LLM Reasoning
Figure 2 for On Designing Effective RL Reward at Training Time for LLM Reasoning
Figure 3 for On Designing Effective RL Reward at Training Time for LLM Reasoning
Figure 4 for On Designing Effective RL Reward at Training Time for LLM Reasoning
Viaarxiv icon