Picture for Sanmi Koyejo

Sanmi Koyejo

Stanford University

Metric Match: A Subset Selection Approach to Evaluating LLM Judge Reliability

Add code
Jun 12, 2026
Viaarxiv icon

Deployment-Centered Evaluation: Predicting Query-Level Rejection Risk in a Clinical LLM System

Add code
Jun 10, 2026
Viaarxiv icon

Evaluation Cards: An Interpretive Layer for AI Evaluation Reporting

Add code
Jun 09, 2026
Viaarxiv icon

CARE: A Conformal Safety Layer for Medical Summarization

Add code
Jun 08, 2026
Viaarxiv icon

When Behavioral Safety Evaluation Fails: A Representation-Level Perspective

Add code
Jun 06, 2026
Viaarxiv icon

Beyond Homophily: Towards Generalized Graph Reconstruction Attack and Defense

Add code
Jun 06, 2026
Viaarxiv icon

The Easy, the Hard, and the Learnable: Confidence and Difficulty-Adaptive Policy Optimization for LLM Reasoning

Add code
Jun 06, 2026
Viaarxiv icon

Welfare, Improvability, and Variance: A Principal-Agent Approach to Optimal Benchmark Item Aggregation

Add code
May 29, 2026
Viaarxiv icon

Is Backpropagation Optimal? When Synthetic Gradients Improve Sample Efficiency

Add code
May 27, 2026
Viaarxiv icon

AI Cartography: Mapping the Latent Landscape of AI Benchmark Ecosystems

Add code
May 24, 2026
Viaarxiv icon