Picture for Moritz Hardt

Moritz Hardt

Limits to scalable evaluation at the frontier: LLM as Judge won't beat twice the data

Add code
Oct 17, 2024
Figure 1 for Limits to scalable evaluation at the frontier: LLM as Judge won't beat twice the data
Figure 2 for Limits to scalable evaluation at the frontier: LLM as Judge won't beat twice the data
Figure 3 for Limits to scalable evaluation at the frontier: LLM as Judge won't beat twice the data
Figure 4 for Limits to scalable evaluation at the frontier: LLM as Judge won't beat twice the data
Viaarxiv icon

Lawma: The Power of Specialization for Legal Tasks

Add code
Jul 23, 2024
Viaarxiv icon

Evaluating language models as risk scores

Add code
Jul 19, 2024
Viaarxiv icon

Training on the Test Task Confounds Evaluation and Emergence

Add code
Jul 10, 2024
Figure 1 for Training on the Test Task Confounds Evaluation and Emergence
Figure 2 for Training on the Test Task Confounds Evaluation and Emergence
Figure 3 for Training on the Test Task Confounds Evaluation and Emergence
Figure 4 for Training on the Test Task Confounds Evaluation and Emergence
Viaarxiv icon

Allocation Requires Prediction Only if Inequality Is Low

Add code
Jun 19, 2024
Figure 1 for Allocation Requires Prediction Only if Inequality Is Low
Figure 2 for Allocation Requires Prediction Only if Inequality Is Low
Figure 3 for Allocation Requires Prediction Only if Inequality Is Low
Figure 4 for Allocation Requires Prediction Only if Inequality Is Low
Viaarxiv icon

An engine not a camera: Measuring performative power of online search

Add code
May 29, 2024
Viaarxiv icon

Inherent Trade-Offs between Diversity and Stability in Multi-Task Benchmarks

Add code
May 06, 2024
Viaarxiv icon

ImageNot: A contrast with ImageNet preserves model rankings

Add code
Apr 02, 2024
Viaarxiv icon

Predictors from causal features do not generalize better to new domains

Add code
Feb 15, 2024
Viaarxiv icon

Don't Label Twice: Quantity Beats Quality when Comparing Binary Classifiers on a Budget

Add code
Feb 03, 2024
Viaarxiv icon