Picture for Bettina Messmer

Bettina Messmer

Analyzing & Reducing the Need for Learning Rate Warmup in GPT Training

Add code
Oct 31, 2024
Viaarxiv icon

On-device Collaborative Language Modeling via a Mixture of Generalists and Specialists

Add code
Sep 20, 2024
Viaarxiv icon

Towards an empirical understanding of MoE design choices

Add code
Feb 20, 2024
Viaarxiv icon

Rotational Optimizers: Simple & Robust DNN Training

Add code
May 26, 2023
Viaarxiv icon