Picture for Depen Morwani

Depen Morwani

How Does Critical Batch Size Scale in Pre-training?

Add code
Oct 29, 2024
Viaarxiv icon

SOAP: Improving and Stabilizing Shampoo using Adam

Add code
Sep 17, 2024
Viaarxiv icon

Deconstructing What Makes a Good Optimizer for Language Models

Add code
Jul 10, 2024
Viaarxiv icon

A New Perspective on Shampoo's Preconditioner

Add code
Jun 25, 2024
Viaarxiv icon

Feature emergence via margin maximization: case studies in algebraic tasks

Add code
Nov 13, 2023
Viaarxiv icon

Beyond Implicit Bias: The Insignificance of SGD Noise in Online Learning

Add code
Jun 14, 2023
Viaarxiv icon

Feature-Learning Networks Are Consistent Across Widths At Realistic Scales

Add code
May 28, 2023
Viaarxiv icon

Simplicity Bias in 1-Hidden Layer Neural Networks

Add code
Feb 01, 2023
Viaarxiv icon

Using noise resilience for ranking generalization of deep neural networks

Add code
Dec 16, 2020
Figure 1 for Using noise resilience for ranking generalization of deep neural networks
Figure 2 for Using noise resilience for ranking generalization of deep neural networks
Viaarxiv icon

Inductive Bias of Gradient Descent for Exponentially Weight Normalized Smooth Homogeneous Neural Nets

Add code
Oct 24, 2020
Figure 1 for Inductive Bias of Gradient Descent for Exponentially Weight Normalized Smooth Homogeneous Neural Nets
Figure 2 for Inductive Bias of Gradient Descent for Exponentially Weight Normalized Smooth Homogeneous Neural Nets
Figure 3 for Inductive Bias of Gradient Descent for Exponentially Weight Normalized Smooth Homogeneous Neural Nets
Figure 4 for Inductive Bias of Gradient Descent for Exponentially Weight Normalized Smooth Homogeneous Neural Nets
Viaarxiv icon