Picture for Dieuwke Hupkes

Dieuwke Hupkes

Jack

Compute Optimal Scaling of Skills: Knowledge vs Reasoning

Add code
Mar 13, 2025
Viaarxiv icon

Correlating and Predicting Human Evaluations of Language Models from Natural Language Processing Benchmarks

Add code
Feb 24, 2025
Viaarxiv icon

MLGym: A New Framework and Benchmark for Advancing AI Research Agents

Add code
Feb 20, 2025
Figure 1 for MLGym: A New Framework and Benchmark for Advancing AI Research Agents
Figure 2 for MLGym: A New Framework and Benchmark for Advancing AI Research Agents
Figure 3 for MLGym: A New Framework and Benchmark for Advancing AI Research Agents
Figure 4 for MLGym: A New Framework and Benchmark for Advancing AI Research Agents
Viaarxiv icon

Lost in Inference: Rediscovering the Role of Natural Language Inference for Large Language Models

Add code
Nov 21, 2024
Viaarxiv icon

Evaluation data contamination in LLMs: how do we measure it and (when) does it matter?

Add code
Nov 06, 2024
Viaarxiv icon

The Llama 3 Herd of Models

Add code
Jul 31, 2024
Viaarxiv icon

Judging the Judges: Evaluating Alignment and Vulnerabilities in LLMs-as-Judges

Add code
Jun 18, 2024
Figure 1 for Judging the Judges: Evaluating Alignment and Vulnerabilities in LLMs-as-Judges
Figure 2 for Judging the Judges: Evaluating Alignment and Vulnerabilities in LLMs-as-Judges
Figure 3 for Judging the Judges: Evaluating Alignment and Vulnerabilities in LLMs-as-Judges
Figure 4 for Judging the Judges: Evaluating Alignment and Vulnerabilities in LLMs-as-Judges
Viaarxiv icon

Quantifying Variance in Evaluation Benchmarks

Add code
Jun 14, 2024
Figure 1 for Quantifying Variance in Evaluation Benchmarks
Figure 2 for Quantifying Variance in Evaluation Benchmarks
Figure 3 for Quantifying Variance in Evaluation Benchmarks
Figure 4 for Quantifying Variance in Evaluation Benchmarks
Viaarxiv icon

Interpretability of Language Models via Task Spaces

Add code
Jun 10, 2024
Figure 1 for Interpretability of Language Models via Task Spaces
Figure 2 for Interpretability of Language Models via Task Spaces
Figure 3 for Interpretability of Language Models via Task Spaces
Figure 4 for Interpretability of Language Models via Task Spaces
Viaarxiv icon

From Form to Meaning: Probing the Semantic Depths of Language Models Using Multisense Consistency

Add code
Apr 18, 2024
Figure 1 for From Form to Meaning: Probing the Semantic Depths of Language Models Using Multisense Consistency
Figure 2 for From Form to Meaning: Probing the Semantic Depths of Language Models Using Multisense Consistency
Figure 3 for From Form to Meaning: Probing the Semantic Depths of Language Models Using Multisense Consistency
Figure 4 for From Form to Meaning: Probing the Semantic Depths of Language Models Using Multisense Consistency
Viaarxiv icon