Picture for Akbir Khan

Akbir Khan

Auditing language models for hidden objectives

Add code
Mar 14, 2025
Viaarxiv icon

Multi-Agent Risks from Advanced AI

Add code
Feb 19, 2025
Viaarxiv icon

Alignment faking in large language models

Add code
Dec 18, 2024
Viaarxiv icon

Adaptive Deployment of Untrusted LLMs Reduces Distributed Threats

Add code
Nov 26, 2024
Figure 1 for Adaptive Deployment of Untrusted LLMs Reduces Distributed Threats
Figure 2 for Adaptive Deployment of Untrusted LLMs Reduces Distributed Threats
Figure 3 for Adaptive Deployment of Untrusted LLMs Reduces Distributed Threats
Figure 4 for Adaptive Deployment of Untrusted LLMs Reduces Distributed Threats
Viaarxiv icon

BALROG: Benchmarking Agentic LLM and VLM Reasoning On Games

Add code
Nov 20, 2024
Figure 1 for BALROG: Benchmarking Agentic LLM and VLM Reasoning On Games
Figure 2 for BALROG: Benchmarking Agentic LLM and VLM Reasoning On Games
Figure 3 for BALROG: Benchmarking Agentic LLM and VLM Reasoning On Games
Figure 4 for BALROG: Benchmarking Agentic LLM and VLM Reasoning On Games
Viaarxiv icon

Debating with More Persuasive LLMs Leads to More Truthful Answers

Add code
Feb 15, 2024
Viaarxiv icon

Leading the Pack: N-player Opponent Shaping

Add code
Dec 26, 2023
Figure 1 for Leading the Pack: N-player Opponent Shaping
Figure 2 for Leading the Pack: N-player Opponent Shaping
Figure 3 for Leading the Pack: N-player Opponent Shaping
Figure 4 for Leading the Pack: N-player Opponent Shaping
Viaarxiv icon

Scaling Opponent Shaping to High Dimensional Games

Add code
Dec 19, 2023
Figure 1 for Scaling Opponent Shaping to High Dimensional Games
Figure 2 for Scaling Opponent Shaping to High Dimensional Games
Figure 3 for Scaling Opponent Shaping to High Dimensional Games
Figure 4 for Scaling Opponent Shaping to High Dimensional Games
Viaarxiv icon

JaxMARL: Multi-Agent RL Environments in JAX

Add code
Nov 20, 2023
Viaarxiv icon

MAESTRO: Open-Ended Environment Design for Multi-Agent Reinforcement Learning

Add code
Mar 06, 2023
Viaarxiv icon