Picture for Alex Fréchette

Alex Fréchette

Tapered Off-Policy REINFORCE: Stable and efficient reinforcement learning for LLMs

Add code
Mar 19, 2025
Viaarxiv icon