Picture for Anka Reuel

Anka Reuel

Michael Pokorny

Every Eval Ever: A Unifying Schema and Community Repository for AI Evaluation Results

Add code
Jun 12, 2026
Viaarxiv icon

Evaluation Cards: An Interpretive Layer for AI Evaluation Reporting

Add code
Jun 09, 2026
Viaarxiv icon

Welfare, Improvability, and Variance: A Principal-Agent Approach to Optimal Benchmark Item Aggregation

Add code
May 29, 2026
Viaarxiv icon

AI Cartography: Mapping the Latent Landscape of AI Benchmark Ecosystems

Add code
May 24, 2026
Viaarxiv icon

When AI Benchmarks Plateau: A Systematic Study of Benchmark Saturation

Add code
Feb 18, 2026
Viaarxiv icon

Who Evaluates AI's Social Impacts? Mapping Coverage and Gaps in First and Third Party Evaluations

Add code
Nov 06, 2025
Figure 1 for Who Evaluates AI's Social Impacts? Mapping Coverage and Gaps in First and Third Party Evaluations
Figure 2 for Who Evaluates AI's Social Impacts? Mapping Coverage and Gaps in First and Third Party Evaluations
Figure 3 for Who Evaluates AI's Social Impacts? Mapping Coverage and Gaps in First and Third Party Evaluations
Figure 4 for Who Evaluates AI's Social Impacts? Mapping Coverage and Gaps in First and Third Party Evaluations
Viaarxiv icon

Risk Management for Mitigating Benchmark Failure Modes: BenchRisk

Add code
Oct 24, 2025
Viaarxiv icon

Measurement to Meaning: A Validity-Centered Framework for AI Evaluation

Add code
May 13, 2025
Viaarxiv icon

Artificial Intelligence Index Report 2025

Add code
Apr 08, 2025
Viaarxiv icon

Multi-Agent Risks from Advanced AI

Add code
Feb 19, 2025
Figure 1 for Multi-Agent Risks from Advanced AI
Figure 2 for Multi-Agent Risks from Advanced AI
Figure 3 for Multi-Agent Risks from Advanced AI
Figure 4 for Multi-Agent Risks from Advanced AI
Viaarxiv icon