Picture for Yilin Bao

Yilin Bao

Offline Reinforcement Learning for LLM Multi-Step Reasoning

Add code
Dec 20, 2024
Viaarxiv icon