Picture for Depen Morwani

Depen Morwani

Connections between Schedule-Free Optimizers, AdEMAMix, and Accelerated SGD Variants

Add code
Feb 04, 2025
Viaarxiv icon

How Does Critical Batch Size Scale in Pre-training?

Add code
Oct 29, 2024
Figure 1 for How Does Critical Batch Size Scale in Pre-training?
Figure 2 for How Does Critical Batch Size Scale in Pre-training?
Figure 3 for How Does Critical Batch Size Scale in Pre-training?
Figure 4 for How Does Critical Batch Size Scale in Pre-training?
Viaarxiv icon

SOAP: Improving and Stabilizing Shampoo using Adam

Add code
Sep 17, 2024
Viaarxiv icon

Deconstructing What Makes a Good Optimizer for Language Models

Add code
Jul 10, 2024
Viaarxiv icon

A New Perspective on Shampoo's Preconditioner

Add code
Jun 25, 2024
Viaarxiv icon

Feature emergence via margin maximization: case studies in algebraic tasks

Add code
Nov 13, 2023
Figure 1 for Feature emergence via margin maximization: case studies in algebraic tasks
Figure 2 for Feature emergence via margin maximization: case studies in algebraic tasks
Figure 3 for Feature emergence via margin maximization: case studies in algebraic tasks
Figure 4 for Feature emergence via margin maximization: case studies in algebraic tasks
Viaarxiv icon

Beyond Implicit Bias: The Insignificance of SGD Noise in Online Learning

Add code
Jun 14, 2023
Viaarxiv icon

Feature-Learning Networks Are Consistent Across Widths At Realistic Scales

Add code
May 28, 2023
Viaarxiv icon

Simplicity Bias in 1-Hidden Layer Neural Networks

Add code
Feb 01, 2023
Viaarxiv icon

Using noise resilience for ranking generalization of deep neural networks

Add code
Dec 16, 2020
Figure 1 for Using noise resilience for ranking generalization of deep neural networks
Figure 2 for Using noise resilience for ranking generalization of deep neural networks
Viaarxiv icon