Picture for Stephane Hatgis-Kessell

Stephane Hatgis-Kessell

When are LLMs Sufficient Policy Optimizers for Sequential RL Tasks?

Add code
May 29, 2026
Viaarxiv icon

Influencing Humans to Conform to Preference Models for RLHF

Add code
Jan 11, 2025
Figure 1 for Influencing Humans to Conform to Preference Models for RLHF
Figure 2 for Influencing Humans to Conform to Preference Models for RLHF
Figure 3 for Influencing Humans to Conform to Preference Models for RLHF
Figure 4 for Influencing Humans to Conform to Preference Models for RLHF
Viaarxiv icon

Learning Optimal Advantage from Preferences and Mistaking it for Reward

Add code
Oct 03, 2023
Figure 1 for Learning Optimal Advantage from Preferences and Mistaking it for Reward
Figure 2 for Learning Optimal Advantage from Preferences and Mistaking it for Reward
Figure 3 for Learning Optimal Advantage from Preferences and Mistaking it for Reward
Figure 4 for Learning Optimal Advantage from Preferences and Mistaking it for Reward
Viaarxiv icon

Models of human preference for learning reward functions

Add code
Jun 05, 2022
Figure 1 for Models of human preference for learning reward functions
Figure 2 for Models of human preference for learning reward functions
Figure 3 for Models of human preference for learning reward functions
Figure 4 for Models of human preference for learning reward functions
Viaarxiv icon