Picture for José Hernández-Orallo

José Hernández-Orallo

Cognitive Science-Inspired Evaluation of Core Capabilities for Object Understanding in AI

Add code
Mar 27, 2025
Viaarxiv icon

General Scales Unlock AI Evaluation with Explanatory and Predictive Power

Add code
Mar 09, 2025
Viaarxiv icon

Paradigms of AI Evaluation: Mapping Goals, Methodologies and Culture

Add code
Feb 21, 2025
Viaarxiv icon

PredictaBoard: Benchmarking LLM Score Predictability

Add code
Feb 20, 2025
Viaarxiv icon

Leaving the barn door open for Clever Hans: Simple features predict LLM benchmark answers

Add code
Oct 15, 2024
Viaarxiv icon

100 instances is all you need: predicting the success of a new LLM on unseen data by testing on a few instances

Add code
Sep 05, 2024
Viaarxiv icon

Learning Alternative Ways of Performing a Task

Add code
Apr 03, 2024
Viaarxiv icon

Animal-AI 3: What's New & Why You Should Care

Add code
Dec 18, 2023
Figure 1 for Animal-AI 3: What's New & Why You Should Care
Figure 2 for Animal-AI 3: What's New & Why You Should Care
Figure 3 for Animal-AI 3: What's New & Why You Should Care
Figure 4 for Animal-AI 3: What's New & Why You Should Care
Viaarxiv icon

An International Consortium for Evaluations of Societal-Scale Risks from Advanced AI

Add code
Nov 06, 2023
Figure 1 for An International Consortium for Evaluations of Societal-Scale Risks from Advanced AI
Figure 2 for An International Consortium for Evaluations of Societal-Scale Risks from Advanced AI
Figure 3 for An International Consortium for Evaluations of Societal-Scale Risks from Advanced AI
Figure 4 for An International Consortium for Evaluations of Societal-Scale Risks from Advanced AI
Viaarxiv icon

Predictable Artificial Intelligence

Add code
Oct 09, 2023
Figure 1 for Predictable Artificial Intelligence
Figure 2 for Predictable Artificial Intelligence
Figure 3 for Predictable Artificial Intelligence
Figure 4 for Predictable Artificial Intelligence
Viaarxiv icon