Picture for Joar Skalse

Joar Skalse

Partial Identifiability in Inverse Reinforcement Learning For Agents With Non-Exponential Discounting

Add code
Dec 15, 2024
Viaarxiv icon

Partial Identifiability and Misspecification in Inverse Reinforcement Learning

Add code
Nov 24, 2024
Viaarxiv icon

The Perils of Optimizing Learned Reward Functions: Low Training Error Does Not Guarantee Low Regret

Add code
Jun 22, 2024
Viaarxiv icon

Towards Guaranteed Safe AI: A Framework for Ensuring Robust and Reliable AI Systems

Add code
May 10, 2024
Figure 1 for Towards Guaranteed Safe AI: A Framework for Ensuring Robust and Reliable AI Systems
Figure 2 for Towards Guaranteed Safe AI: A Framework for Ensuring Robust and Reliable AI Systems
Figure 3 for Towards Guaranteed Safe AI: A Framework for Ensuring Robust and Reliable AI Systems
Figure 4 for Towards Guaranteed Safe AI: A Framework for Ensuring Robust and Reliable AI Systems
Viaarxiv icon

Quantifying the Sensitivity of Inverse Reinforcement Learning to Misspecification

Add code
Mar 11, 2024
Viaarxiv icon

On the Limitations of Markovian Rewards to Express Multi-Objective, Risk-Sensitive, and Modal Tasks

Add code
Jan 26, 2024
Viaarxiv icon

On The Expressivity of Objective-Specification Formalisms in Reinforcement Learning

Add code
Oct 18, 2023
Viaarxiv icon

Goodhart's Law in Reinforcement Learning

Add code
Oct 13, 2023
Viaarxiv icon

STARC: A General Framework For Quantifying Differences Between Reward Functions

Add code
Sep 26, 2023
Viaarxiv icon

Lexicographic Multi-Objective Reinforcement Learning

Add code
Dec 28, 2022
Viaarxiv icon