Picture for John Burden

John Burden

Shammie

General Scales Unlock AI Evaluation with Explanatory and Predictive Power

Add code
Mar 09, 2025
Viaarxiv icon

Framing the Game: How Context Shapes LLM Decision-Making

Add code
Mar 05, 2025
Viaarxiv icon

Paradigms of AI Evaluation: Mapping Goals, Methodologies and Culture

Add code
Feb 21, 2025
Viaarxiv icon

Conversational Complexity for Assessing Risk in Large Language Models

Add code
Sep 02, 2024
Figure 1 for Conversational Complexity for Assessing Risk in Large Language Models
Figure 2 for Conversational Complexity for Assessing Risk in Large Language Models
Figure 3 for Conversational Complexity for Assessing Risk in Large Language Models
Figure 4 for Conversational Complexity for Assessing Risk in Large Language Models
Viaarxiv icon

Evaluating AI Evaluation: Perils and Prospects

Add code
Jul 12, 2024
Viaarxiv icon

Animal-AI 3: What's New & Why You Should Care

Add code
Dec 18, 2023
Figure 1 for Animal-AI 3: What's New & Why You Should Care
Figure 2 for Animal-AI 3: What's New & Why You Should Care
Figure 3 for Animal-AI 3: What's New & Why You Should Care
Figure 4 for Animal-AI 3: What's New & Why You Should Care
Viaarxiv icon

An International Consortium for Evaluations of Societal-Scale Risks from Advanced AI

Add code
Nov 06, 2023
Figure 1 for An International Consortium for Evaluations of Societal-Scale Risks from Advanced AI
Figure 2 for An International Consortium for Evaluations of Societal-Scale Risks from Advanced AI
Figure 3 for An International Consortium for Evaluations of Societal-Scale Risks from Advanced AI
Figure 4 for An International Consortium for Evaluations of Societal-Scale Risks from Advanced AI
Viaarxiv icon

Predictable Artificial Intelligence

Add code
Oct 09, 2023
Figure 1 for Predictable Artificial Intelligence
Figure 2 for Predictable Artificial Intelligence
Figure 3 for Predictable Artificial Intelligence
Figure 4 for Predictable Artificial Intelligence
Viaarxiv icon

Inferring Capabilities from Task Performance with Bayesian Triangulation

Add code
Sep 21, 2023
Figure 1 for Inferring Capabilities from Task Performance with Bayesian Triangulation
Figure 2 for Inferring Capabilities from Task Performance with Bayesian Triangulation
Figure 3 for Inferring Capabilities from Task Performance with Bayesian Triangulation
Figure 4 for Inferring Capabilities from Task Performance with Bayesian Triangulation
Viaarxiv icon

Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models

Add code
Jun 10, 2022
Viaarxiv icon