Picture for Adam Khoja

Adam Khoja

Michael Pokorny

Multi-Agent Inverse Q-Learning from Demonstrations

Add code
Mar 06, 2025
Viaarxiv icon

The MASK Benchmark: Disentangling Honesty From Accuracy in AI Systems

Add code
Mar 05, 2025
Viaarxiv icon

EnigmaEval: A Benchmark of Long Multimodal Reasoning Challenges

Add code
Feb 13, 2025
Viaarxiv icon

Utility Engineering: Analyzing and Controlling Emergent Value Systems in AIs

Add code
Feb 12, 2025
Viaarxiv icon

Humanity's Last Exam

Add code
Jan 24, 2025
Viaarxiv icon

Safetywashing: Do AI Safety Benchmarks Actually Measure Safety Progress?

Add code
Jul 31, 2024
Viaarxiv icon

The WMDP Benchmark: Measuring and Reducing Malicious Use With Unlearning

Add code
Mar 06, 2024
Figure 1 for The WMDP Benchmark: Measuring and Reducing Malicious Use With Unlearning
Figure 2 for The WMDP Benchmark: Measuring and Reducing Malicious Use With Unlearning
Figure 3 for The WMDP Benchmark: Measuring and Reducing Malicious Use With Unlearning
Figure 4 for The WMDP Benchmark: Measuring and Reducing Malicious Use With Unlearning
Viaarxiv icon