Picture for Tai-wei Chang

Tai-wei Chang

R3HF: Reward Redistribution for Enhancing Reinforcement Learning from Human Feedback

Add code
Nov 13, 2024
Viaarxiv icon