Picture for Goran Radanović

Goran Radanović

Reward Model Learning vs. Direct Policy Optimization: A Comparative Analysis of Learning from Human Preferences

Add code
Mar 04, 2024
Viaarxiv icon

Corruption-Robust Offline Two-Player Zero-Sum Markov Games

Add code
Mar 04, 2024
Viaarxiv icon

Corruption Robust Offline Reinforcement Learning with Human Feedback

Add code
Feb 09, 2024
Viaarxiv icon