Picture for Anand Siththaranjan

Anand Siththaranjan

AI Alignment with Changing and Influenceable Reward Functions

Add code
May 28, 2024
Viaarxiv icon

Distributional Preference Learning: Understanding and Accounting for Hidden Context in RLHF

Add code
Dec 13, 2023
Viaarxiv icon

Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback

Add code
Jul 27, 2023
Figure 1 for Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback
Figure 2 for Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback
Figure 3 for Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback
Figure 4 for Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback
Viaarxiv icon

Analyzing Human Models that Adapt Online

Add code
Mar 09, 2021
Figure 1 for Analyzing Human Models that Adapt Online
Figure 2 for Analyzing Human Models that Adapt Online
Figure 3 for Analyzing Human Models that Adapt Online
Figure 4 for Analyzing Human Models that Adapt Online
Viaarxiv icon