Picture for Martin Jaggi

Martin Jaggi

EPFL

Analyzing & Reducing the Need for Learning Rate Warmup in GPT Training

Add code
Oct 31, 2024
Viaarxiv icon

Improving Stochastic Cubic Newton with Momentum

Add code
Oct 25, 2024
Figure 1 for Improving Stochastic Cubic Newton with Momentum
Figure 2 for Improving Stochastic Cubic Newton with Momentum
Figure 3 for Improving Stochastic Cubic Newton with Momentum
Viaarxiv icon

HyperINF: Unleashing the HyperPower of the Schulz's Method for Data Influence Estimation

Add code
Oct 07, 2024
Viaarxiv icon

On-device Collaborative Language Modeling via a Mixture of Generalists and Specialists

Add code
Sep 20, 2024
Viaarxiv icon

CoBo: Collaborative Learning via Bilevel Optimization

Add code
Sep 09, 2024
Viaarxiv icon

A New First-Order Meta-Learning Algorithm with Convergence Guarantees

Add code
Sep 05, 2024
Viaarxiv icon

Could ChatGPT get an Engineering Degree? Evaluating Higher Education Vulnerability to AI Assistants

Add code
Aug 07, 2024
Viaarxiv icon

Effective Interplay between Sparsity and Quantization: From Theory to Practice

Add code
May 31, 2024
Figure 1 for Effective Interplay between Sparsity and Quantization: From Theory to Practice
Figure 2 for Effective Interplay between Sparsity and Quantization: From Theory to Practice
Figure 3 for Effective Interplay between Sparsity and Quantization: From Theory to Practice
Figure 4 for Effective Interplay between Sparsity and Quantization: From Theory to Practice
Viaarxiv icon

Deep Grokking: Would Deep Neural Networks Generalize Better?

Add code
May 29, 2024
Viaarxiv icon

Scaling Laws and Compute-Optimal Training Beyond Fixed Training Durations

Add code
May 29, 2024
Viaarxiv icon