Picture for Penghui Qi

Penghui Qi

Rethinking the Trust Region in LLM Reinforcement Learning

Add code
Feb 04, 2026
Viaarxiv icon

Revisiting Parameter Server in LLM Post-Training

Add code
Jan 27, 2026
Viaarxiv icon

Defeating the Training-Inference Mismatch via FP16

Add code
Oct 30, 2025
Viaarxiv icon

Optimizing Anytime Reasoning via Budget Relative Policy Optimization

Add code
May 19, 2025
Figure 1 for Optimizing Anytime Reasoning via Budget Relative Policy Optimization
Figure 2 for Optimizing Anytime Reasoning via Budget Relative Policy Optimization
Figure 3 for Optimizing Anytime Reasoning via Budget Relative Policy Optimization
Figure 4 for Optimizing Anytime Reasoning via Budget Relative Policy Optimization
Viaarxiv icon

Understanding R1-Zero-Like Training: A Critical Perspective

Add code
Mar 26, 2025
Figure 1 for Understanding R1-Zero-Like Training: A Critical Perspective
Figure 2 for Understanding R1-Zero-Like Training: A Critical Perspective
Figure 3 for Understanding R1-Zero-Like Training: A Critical Perspective
Figure 4 for Understanding R1-Zero-Like Training: A Critical Perspective
Viaarxiv icon

PipeOffload: Improving Scalability of Pipeline Parallelism with Memory Optimization

Add code
Mar 03, 2025
Figure 1 for PipeOffload: Improving Scalability of Pipeline Parallelism with Memory Optimization
Figure 2 for PipeOffload: Improving Scalability of Pipeline Parallelism with Memory Optimization
Figure 3 for PipeOffload: Improving Scalability of Pipeline Parallelism with Memory Optimization
Figure 4 for PipeOffload: Improving Scalability of Pipeline Parallelism with Memory Optimization
Viaarxiv icon

Pipeline Parallelism with Controllable Memory

Add code
May 24, 2024
Viaarxiv icon

SCC: an efficient deep reinforcement learning agent mastering the game of StarCraft II

Add code
Dec 24, 2020
Figure 1 for SCC: an efficient deep reinforcement learning agent mastering the game of StarCraft II
Figure 2 for SCC: an efficient deep reinforcement learning agent mastering the game of StarCraft II
Figure 3 for SCC: an efficient deep reinforcement learning agent mastering the game of StarCraft II
Figure 4 for SCC: an efficient deep reinforcement learning agent mastering the game of StarCraft II
Viaarxiv icon

Artificial Intelligence for Prosthetics - challenge solutions

Add code
Feb 07, 2019
Figure 1 for Artificial Intelligence for Prosthetics - challenge solutions
Figure 2 for Artificial Intelligence for Prosthetics - challenge solutions
Figure 3 for Artificial Intelligence for Prosthetics - challenge solutions
Figure 4 for Artificial Intelligence for Prosthetics - challenge solutions
Viaarxiv icon