Picture for David Getzen

David Getzen

Are PPO-ed Language Models Hackable?

Add code
May 28, 2024
Viaarxiv icon