Picture for Leon Lang

Leon Lang

Michael Pokorny

Modeling Human Beliefs about AI Behavior for Scalable Oversight

Add code
Feb 28, 2025
Viaarxiv icon

Humanity's Last Exam

Add code
Jan 24, 2025
Viaarxiv icon

Factored space models: Towards causality between levels of abstraction

Add code
Dec 03, 2024
Viaarxiv icon

The Perils of Optimizing Learned Reward Functions: Low Training Error Does Not Guarantee Low Regret

Add code
Jun 22, 2024
Figure 1 for The Perils of Optimizing Learned Reward Functions: Low Training Error Does Not Guarantee Low Regret
Viaarxiv icon

When Your AIs Deceive You: Challenges with Partial Observability of Human Evaluators in Reward Learning

Add code
Mar 03, 2024
Viaarxiv icon

A Wigner-Eckart Theorem for Group Equivariant Convolution Kernels

Add code
Oct 22, 2020
Viaarxiv icon

Learning to Request Guidance in Emergent Communication

Add code
Dec 11, 2019
Figure 1 for Learning to Request Guidance in Emergent Communication
Figure 2 for Learning to Request Guidance in Emergent Communication
Figure 3 for Learning to Request Guidance in Emergent Communication
Figure 4 for Learning to Request Guidance in Emergent Communication
Viaarxiv icon