Picture for Yongbin Li

Yongbin Li

Scaling Self-Evolving Agents via Parametric Memory

Add code
Jun 03, 2026
Viaarxiv icon

EvoTrainer: Co-Evolving LLM Policies and Training Harnesses for Autonomous Agentic Reinforcement Learning

Add code
Jun 02, 2026
Viaarxiv icon

ESPO: Early-Stopping Proximal Policy Optimization

Add code
May 28, 2026
Viaarxiv icon

Think Anywhere in Code Generation

Add code
Apr 02, 2026
Viaarxiv icon

MemPO: Self-Memory Policy Optimization for Long-Horizon Agents

Add code
Feb 28, 2026
Viaarxiv icon

P-GenRM: Personalized Generative Reward Model with Test-time User-based Scaling

Add code
Feb 12, 2026
Viaarxiv icon

Beyond Quantity: Trajectory Diversity Scaling for Code Agents

Add code
Feb 03, 2026
Viaarxiv icon

ExpSeek: Self-Triggered Experience Seeking for Web Agents

Add code
Jan 13, 2026
Viaarxiv icon

Controlling Multimodal Conversational Agents with Coverage-Enhanced Latent Actions

Add code
Jan 12, 2026
Viaarxiv icon

Reward Modeling from Natural Language Human Feedback

Add code
Jan 12, 2026
Viaarxiv icon