Picture for Shawn Im

Shawn Im

A Unified Understanding and Evaluation of Steering Methods

Add code
Feb 04, 2025
Viaarxiv icon

On the Generalization of Preference Learning with DPO

Add code
Aug 06, 2024
Viaarxiv icon

Understanding the Learning Dynamics of Alignment with Human Feedback

Add code
Apr 08, 2024
Viaarxiv icon

Evaluating the Utility of Model Explanations for Model Development

Add code
Dec 10, 2023
Figure 1 for Evaluating the Utility of Model Explanations for Model Development
Figure 2 for Evaluating the Utility of Model Explanations for Model Development
Figure 3 for Evaluating the Utility of Model Explanations for Model Development
Figure 4 for Evaluating the Utility of Model Explanations for Model Development
Viaarxiv icon