Picture for John Burden

John Burden

Shammie

Pressure Reveals Character: Behavioural Alignment Evaluation at Depth

Add code
Feb 24, 2026
Viaarxiv icon

General Scales Unlock AI Evaluation with Explanatory and Predictive Power

Add code
Mar 09, 2025
Viaarxiv icon

Framing the Game: How Context Shapes LLM Decision-Making

Add code
Mar 05, 2025
Viaarxiv icon

Paradigms of AI Evaluation: Mapping Goals, Methodologies and Culture

Add code
Feb 21, 2025
Viaarxiv icon

Conversational Complexity for Assessing Risk in Large Language Models

Add code
Sep 02, 2024
Figure 1 for Conversational Complexity for Assessing Risk in Large Language Models
Figure 2 for Conversational Complexity for Assessing Risk in Large Language Models
Figure 3 for Conversational Complexity for Assessing Risk in Large Language Models
Figure 4 for Conversational Complexity for Assessing Risk in Large Language Models
Viaarxiv icon

Evaluating AI Evaluation: Perils and Prospects

Add code
Jul 12, 2024
Viaarxiv icon

Animal-AI 3: What's New & Why You Should Care

Add code
Dec 18, 2023
Figure 1 for Animal-AI 3: What's New & Why You Should Care
Figure 2 for Animal-AI 3: What's New & Why You Should Care
Figure 3 for Animal-AI 3: What's New & Why You Should Care
Figure 4 for Animal-AI 3: What's New & Why You Should Care
Viaarxiv icon

An International Consortium for Evaluations of Societal-Scale Risks from Advanced AI

Add code
Nov 06, 2023
Figure 1 for An International Consortium for Evaluations of Societal-Scale Risks from Advanced AI
Figure 2 for An International Consortium for Evaluations of Societal-Scale Risks from Advanced AI
Figure 3 for An International Consortium for Evaluations of Societal-Scale Risks from Advanced AI
Figure 4 for An International Consortium for Evaluations of Societal-Scale Risks from Advanced AI
Viaarxiv icon

Predictable Artificial Intelligence

Add code
Oct 09, 2023
Figure 1 for Predictable Artificial Intelligence
Figure 2 for Predictable Artificial Intelligence
Figure 3 for Predictable Artificial Intelligence
Figure 4 for Predictable Artificial Intelligence
Viaarxiv icon

Inferring Capabilities from Task Performance with Bayesian Triangulation

Add code
Sep 21, 2023
Figure 1 for Inferring Capabilities from Task Performance with Bayesian Triangulation
Figure 2 for Inferring Capabilities from Task Performance with Bayesian Triangulation
Figure 3 for Inferring Capabilities from Task Performance with Bayesian Triangulation
Figure 4 for Inferring Capabilities from Task Performance with Bayesian Triangulation
Viaarxiv icon