Picture for Eliya Habba

Eliya Habba

When AI Benchmarks Plateau: A Systematic Study of Benchmark Saturation

Add code
Feb 18, 2026
Viaarxiv icon

Who Evaluates AI's Social Impacts? Mapping Coverage and Gaps in First and Third Party Evaluations

Add code
Nov 06, 2025
Figure 1 for Who Evaluates AI's Social Impacts? Mapping Coverage and Gaps in First and Third Party Evaluations
Figure 2 for Who Evaluates AI's Social Impacts? Mapping Coverage and Gaps in First and Third Party Evaluations
Figure 3 for Who Evaluates AI's Social Impacts? Mapping Coverage and Gaps in First and Third Party Evaluations
Figure 4 for Who Evaluates AI's Social Impacts? Mapping Coverage and Gaps in First and Third Party Evaluations
Viaarxiv icon

JSON Whisperer: Efficient JSON Editing with LLMs

Add code
Oct 06, 2025
Viaarxiv icon

ReliableEval: A Recipe for Stochastic LLM Evaluation via Method of Moments

Add code
May 28, 2025
Viaarxiv icon

DOVE: A Large-Scale Multi-Dimensional Predictions Dataset Towards Meaningful LLM Evaluation

Add code
Mar 04, 2025
Viaarxiv icon

Beyond Benchmarks: On The False Promise of AI Regulation

Add code
Jan 26, 2025
Figure 1 for Beyond Benchmarks: On The False Promise of AI Regulation
Figure 2 for Beyond Benchmarks: On The False Promise of AI Regulation
Figure 3 for Beyond Benchmarks: On The False Promise of AI Regulation
Figure 4 for Beyond Benchmarks: On The False Promise of AI Regulation
Viaarxiv icon

Visual Riddles: a Commonsense and World Knowledge Challenge for Large Vision and Language Models

Add code
Jul 28, 2024
Viaarxiv icon

The Perfect Victim: Computational Analysis of Judicial Attitudes towards Victims of Sexual Violence

Add code
May 09, 2023
Viaarxiv icon