Picture for Yingru Li

Yingru Li

Dr. Kernel: Reinforcement Learning Done Right for Triton Kernel Generations

Add code
Feb 05, 2026
Viaarxiv icon

Beyond Precision: Training-Inference Mismatch is an Optimization Problem and Simple LR Scheduling Fixes It

Add code
Feb 02, 2026
Viaarxiv icon

Taming the Tail: Stable LLM Reinforcement Learning via Dynamic Vocabulary Pruning

Add code
Dec 28, 2025
Viaarxiv icon

Trust Region Masking for Long-Horizon LLM Reinforcement Learning

Add code
Dec 28, 2025
Viaarxiv icon

A Note on Hybrid Online Reinforcement and Imitation Learning for LLMs: Formulations and Algorithms

Add code
Dec 28, 2025
Viaarxiv icon

Harnessing Uncertainty: Entropy-Modulated Policy Gradients for Long-Horizon LLM Agents

Add code
Sep 11, 2025
Viaarxiv icon

GLM-4.5: Agentic, Reasoning, and Coding (ARC) Foundation Models

Add code
Aug 08, 2025
Viaarxiv icon

Logit Dynamics in Softmax Policy Gradient Methods

Add code
Jun 15, 2025
Viaarxiv icon

OWL: Optimized Workforce Learning for General Multi-Agent Assistance in Real-World Task Automation

Add code
May 29, 2025
Viaarxiv icon

Divergence-Augmented Policy Optimization

Add code
Jan 25, 2025
Viaarxiv icon