Picture for Houda Nait El Barj

Houda Nait El Barj

Reinforcement Learning from LLM Feedback to Counteract Goal Misgeneralization

Add code
Jan 14, 2024
Viaarxiv icon