Picture for Micah Carroll

Micah Carroll

Targeted Manipulation and Deception Emerge when Optimizing LLMs for User Feedback

Add code
Nov 04, 2024
Viaarxiv icon

Beyond Preferences in AI Alignment

Add code
Aug 30, 2024
Figure 1 for Beyond Preferences in AI Alignment
Figure 2 for Beyond Preferences in AI Alignment
Figure 3 for Beyond Preferences in AI Alignment
Figure 4 for Beyond Preferences in AI Alignment
Viaarxiv icon

AI Alignment with Changing and Influenceable Reward Functions

Add code
May 28, 2024
Viaarxiv icon

Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback

Add code
Jul 27, 2023
Figure 1 for Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback
Figure 2 for Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback
Figure 3 for Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback
Figure 4 for Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback
Viaarxiv icon

Who Needs to Know? Minimal Knowledge for Optimal Coordination

Add code
Jun 15, 2023
Viaarxiv icon

Time-Efficient Reward Learning via Visually Assisted Cluster Ranking

Add code
Nov 30, 2022
Viaarxiv icon

UniMASK: Unified Inference in Sequential Decision Problems

Add code
Nov 20, 2022
Viaarxiv icon

Optimal Behavior Prior: Data-Efficient Human Models for Improved Human-AI Collaboration

Add code
Nov 19, 2022
Viaarxiv icon

Towards Flexible Inference in Sequential Decision Problems via Bidirectional Transformers

Add code
Apr 28, 2022
Figure 1 for Towards Flexible Inference in Sequential Decision Problems via Bidirectional Transformers
Figure 2 for Towards Flexible Inference in Sequential Decision Problems via Bidirectional Transformers
Figure 3 for Towards Flexible Inference in Sequential Decision Problems via Bidirectional Transformers
Figure 4 for Towards Flexible Inference in Sequential Decision Problems via Bidirectional Transformers
Viaarxiv icon

Estimating and Penalizing Induced Preference Shifts in Recommender Systems

Add code
Apr 25, 2022
Figure 1 for Estimating and Penalizing Induced Preference Shifts in Recommender Systems
Figure 2 for Estimating and Penalizing Induced Preference Shifts in Recommender Systems
Figure 3 for Estimating and Penalizing Induced Preference Shifts in Recommender Systems
Figure 4 for Estimating and Penalizing Induced Preference Shifts in Recommender Systems
Viaarxiv icon