Picture for Rachel Freedman

Rachel Freedman

Linear Probe Penalties Reduce LLM Sycophancy

Add code
Dec 01, 2024
Viaarxiv icon

Social Choice for AI Alignment: Dealing with Diverse Human Feedback

Add code
Apr 16, 2024
Figure 1 for Social Choice for AI Alignment: Dealing with Diverse Human Feedback
Figure 2 for Social Choice for AI Alignment: Dealing with Diverse Human Feedback
Figure 3 for Social Choice for AI Alignment: Dealing with Diverse Human Feedback
Figure 4 for Social Choice for AI Alignment: Dealing with Diverse Human Feedback
Viaarxiv icon

Active teacher selection for reinforcement learning from human feedback

Add code
Oct 23, 2023
Viaarxiv icon

Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback

Add code
Jul 27, 2023
Figure 1 for Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback
Figure 2 for Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback
Figure 3 for Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback
Figure 4 for Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback
Viaarxiv icon

Active Reward Learning from Multiple Teachers

Add code
Mar 02, 2023
Viaarxiv icon

The Expertise Problem: Learning from Specialized Feedback

Add code
Nov 12, 2022
Viaarxiv icon

Choice Set Misspecification in Reward Inference

Add code
Jan 19, 2021
Figure 1 for Choice Set Misspecification in Reward Inference
Figure 2 for Choice Set Misspecification in Reward Inference
Figure 3 for Choice Set Misspecification in Reward Inference
Figure 4 for Choice Set Misspecification in Reward Inference
Viaarxiv icon

Aligning with Heterogeneous Preferences for Kidney Exchange

Add code
Jun 16, 2020
Figure 1 for Aligning with Heterogeneous Preferences for Kidney Exchange
Figure 2 for Aligning with Heterogeneous Preferences for Kidney Exchange
Figure 3 for Aligning with Heterogeneous Preferences for Kidney Exchange
Figure 4 for Aligning with Heterogeneous Preferences for Kidney Exchange
Viaarxiv icon

Adapting a Kidney Exchange Algorithm to Align with Human Values

Add code
May 19, 2020
Figure 1 for Adapting a Kidney Exchange Algorithm to Align with Human Values
Figure 2 for Adapting a Kidney Exchange Algorithm to Align with Human Values
Figure 3 for Adapting a Kidney Exchange Algorithm to Align with Human Values
Figure 4 for Adapting a Kidney Exchange Algorithm to Align with Human Values
Viaarxiv icon