Picture for Jonas Geiping

Jonas Geiping

Answer Matching Outperforms Multiple Choice for Language Model Evaluation

Add code
Jul 03, 2025
Viaarxiv icon

Is your batch size the problem? Revisiting the Adam-SGD gap in language modeling

Add code
Jun 14, 2025
Viaarxiv icon

Capability-Based Scaling Laws for LLM Red-Teaming

Add code
May 26, 2025
Viaarxiv icon

Can you Finetune your Binoculars? Embedding Text Watermarks into the Weights of Large Language Models

Add code
Apr 08, 2025
Viaarxiv icon

Can Language Models Falsify? Evaluating Algorithmic Reasoning with Counterexample Creation

Add code
Feb 26, 2025
Viaarxiv icon

Democratizing AI: Open-source Scalable LLM Training on GPU-based Supercomputers

Add code
Feb 12, 2025
Viaarxiv icon

When, Where and Why to Average Weights?

Add code
Feb 10, 2025
Viaarxiv icon

Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach

Add code
Feb 07, 2025
Viaarxiv icon

Fine, I'll Merge It Myself: A Multi-Fidelity Framework for Automated Model Merging

Add code
Feb 06, 2025
Figure 1 for Fine, I'll Merge It Myself: A Multi-Fidelity Framework for Automated Model Merging
Figure 2 for Fine, I'll Merge It Myself: A Multi-Fidelity Framework for Automated Model Merging
Figure 3 for Fine, I'll Merge It Myself: A Multi-Fidelity Framework for Automated Model Merging
Figure 4 for Fine, I'll Merge It Myself: A Multi-Fidelity Framework for Automated Model Merging
Viaarxiv icon

Great Models Think Alike and this Undermines AI Oversight

Add code
Feb 06, 2025
Viaarxiv icon