Picture for Micah Carroll

Micah Carroll

Tony

Truthfulness Despite Weak Supervision: Evaluating and Training LLMs Using Peer Prediction

Add code
Jan 28, 2026
Viaarxiv icon

Monitoring Monitorability

Add code
Dec 20, 2025
Figure 1 for Monitoring Monitorability
Figure 2 for Monitoring Monitorability
Figure 3 for Monitoring Monitorability
Figure 4 for Monitoring Monitorability
Viaarxiv icon

OpenAI GPT-5 System Card

Add code
Dec 19, 2025
Viaarxiv icon

Robust and Diverse Multi-Agent Learning via Rational Policy Gradient

Add code
Nov 12, 2025
Viaarxiv icon

CTRL-Rec: Controlling Recommender Systems With Natural Language

Add code
Oct 14, 2025
Figure 1 for CTRL-Rec: Controlling Recommender Systems With Natural Language
Figure 2 for CTRL-Rec: Controlling Recommender Systems With Natural Language
Figure 3 for CTRL-Rec: Controlling Recommender Systems With Natural Language
Figure 4 for CTRL-Rec: Controlling Recommender Systems With Natural Language
Viaarxiv icon

Humanity's Last Exam

Add code
Jan 24, 2025
Viaarxiv icon

Targeted Manipulation and Deception Emerge when Optimizing LLMs for User Feedback

Add code
Nov 04, 2024
Viaarxiv icon

Beyond Preferences in AI Alignment

Add code
Aug 30, 2024
Figure 1 for Beyond Preferences in AI Alignment
Figure 2 for Beyond Preferences in AI Alignment
Figure 3 for Beyond Preferences in AI Alignment
Figure 4 for Beyond Preferences in AI Alignment
Viaarxiv icon

AI Alignment with Changing and Influenceable Reward Functions

Add code
May 28, 2024
Figure 1 for AI Alignment with Changing and Influenceable Reward Functions
Figure 2 for AI Alignment with Changing and Influenceable Reward Functions
Figure 3 for AI Alignment with Changing and Influenceable Reward Functions
Figure 4 for AI Alignment with Changing and Influenceable Reward Functions
Viaarxiv icon

Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback

Add code
Jul 27, 2023
Figure 1 for Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback
Figure 2 for Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback
Figure 3 for Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback
Figure 4 for Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback
Viaarxiv icon