Picture for Nikhil Vyas

Nikhil Vyas

The Potential of Second-Order Optimization for LLMs: A Study with Full Gauss-Newton

Add code
Oct 10, 2025
Viaarxiv icon

Connections between Schedule-Free Optimizers, AdEMAMix, and Accelerated SGD Variants

Add code
Feb 04, 2025
Figure 1 for Connections between Schedule-Free Optimizers, AdEMAMix, and Accelerated SGD Variants
Figure 2 for Connections between Schedule-Free Optimizers, AdEMAMix, and Accelerated SGD Variants
Figure 3 for Connections between Schedule-Free Optimizers, AdEMAMix, and Accelerated SGD Variants
Viaarxiv icon

Loss-to-Loss Prediction: Scaling Laws for All Datasets

Add code
Nov 19, 2024
Figure 1 for Loss-to-Loss Prediction: Scaling Laws for All Datasets
Figure 2 for Loss-to-Loss Prediction: Scaling Laws for All Datasets
Figure 3 for Loss-to-Loss Prediction: Scaling Laws for All Datasets
Figure 4 for Loss-to-Loss Prediction: Scaling Laws for All Datasets
Viaarxiv icon

How Does Critical Batch Size Scale in Pre-training?

Add code
Oct 29, 2024
Figure 1 for How Does Critical Batch Size Scale in Pre-training?
Figure 2 for How Does Critical Batch Size Scale in Pre-training?
Figure 3 for How Does Critical Batch Size Scale in Pre-training?
Figure 4 for How Does Critical Batch Size Scale in Pre-training?
Viaarxiv icon

Mixture of Parrots: Experts improve memorization more than reasoning

Add code
Oct 24, 2024
Figure 1 for Mixture of Parrots: Experts improve memorization more than reasoning
Figure 2 for Mixture of Parrots: Experts improve memorization more than reasoning
Figure 3 for Mixture of Parrots: Experts improve memorization more than reasoning
Figure 4 for Mixture of Parrots: Experts improve memorization more than reasoning
Viaarxiv icon

SOAP: Improving and Stabilizing Shampoo using Adam

Add code
Sep 17, 2024
Viaarxiv icon

Deconstructing What Makes a Good Optimizer for Language Models

Add code
Jul 10, 2024
Viaarxiv icon

A New Perspective on Shampoo's Preconditioner

Add code
Jun 25, 2024
Viaarxiv icon

Distinguishing the Knowable from the Unknowable with Language Models

Add code
Feb 05, 2024
Figure 1 for Distinguishing the Knowable from the Unknowable with Language Models
Figure 2 for Distinguishing the Knowable from the Unknowable with Language Models
Figure 3 for Distinguishing the Knowable from the Unknowable with Language Models
Figure 4 for Distinguishing the Knowable from the Unknowable with Language Models
Viaarxiv icon

On Privileged and Convergent Bases in Neural Network Representations

Add code
Jul 24, 2023
Viaarxiv icon