Picture for Sham Kakade

Sham Kakade

From an Image to a Scene: Learning to Imagine the World from a Million 360 Videos

Add code
Dec 10, 2024
Viaarxiv icon

Mind the Gap: Examining the Self-Improvement Capabilities of Large Language Models

Add code
Dec 03, 2024
Viaarxiv icon

Loss-to-Loss Prediction: Scaling Laws for All Datasets

Add code
Nov 19, 2024
Figure 1 for Loss-to-Loss Prediction: Scaling Laws for All Datasets
Figure 2 for Loss-to-Loss Prediction: Scaling Laws for All Datasets
Figure 3 for Loss-to-Loss Prediction: Scaling Laws for All Datasets
Figure 4 for Loss-to-Loss Prediction: Scaling Laws for All Datasets
Viaarxiv icon

How Does Critical Batch Size Scale in Pre-training?

Add code
Oct 29, 2024
Figure 1 for How Does Critical Batch Size Scale in Pre-training?
Figure 2 for How Does Critical Batch Size Scale in Pre-training?
Figure 3 for How Does Critical Batch Size Scale in Pre-training?
Figure 4 for How Does Critical Batch Size Scale in Pre-training?
Viaarxiv icon

Flash Inference: Near Linear Time Inference for Long Convolution Sequence Models and Beyond

Add code
Oct 16, 2024
Viaarxiv icon

LoRA Soups: Merging LoRAs for Practical Skill Composition Tasks

Add code
Oct 16, 2024
Figure 1 for LoRA Soups: Merging LoRAs for Practical Skill Composition Tasks
Figure 2 for LoRA Soups: Merging LoRAs for Practical Skill Composition Tasks
Figure 3 for LoRA Soups: Merging LoRAs for Practical Skill Composition Tasks
Figure 4 for LoRA Soups: Merging LoRAs for Practical Skill Composition Tasks
Viaarxiv icon

SOAP: Improving and Stabilizing Shampoo using Adam

Add code
Sep 17, 2024
Viaarxiv icon

Deconstructing What Makes a Good Optimizer for Language Models

Add code
Jul 10, 2024
Viaarxiv icon

Universal Length Generalization with Turing Programs

Add code
Jul 03, 2024
Viaarxiv icon

A New Perspective on Shampoo's Preconditioner

Add code
Jun 25, 2024
Viaarxiv icon