Picture for Vivian Y. Nastl

Vivian Y. Nastl

Limits to scalable evaluation at the frontier: LLM as Judge won't beat twice the data

Add code
Oct 17, 2024
Figure 1 for Limits to scalable evaluation at the frontier: LLM as Judge won't beat twice the data
Figure 2 for Limits to scalable evaluation at the frontier: LLM as Judge won't beat twice the data
Figure 3 for Limits to scalable evaluation at the frontier: LLM as Judge won't beat twice the data
Figure 4 for Limits to scalable evaluation at the frontier: LLM as Judge won't beat twice the data
Viaarxiv icon

Predictors from causal features do not generalize better to new domains

Add code
Feb 15, 2024
Viaarxiv icon