Picture for Sham Kakade

Sham Kakade

Data-Efficient Multi-Agent Spatial Planning with LLMs

Add code
Feb 26, 2025
Viaarxiv icon

Distributional Scaling Laws for Emergent Capabilities

Add code
Feb 24, 2025
Viaarxiv icon

Train for the Worst, Plan for the Best: Understanding Token Ordering in Masked Diffusions

Add code
Feb 10, 2025
Viaarxiv icon

Connections between Schedule-Free Optimizers, AdEMAMix, and Accelerated SGD Variants

Add code
Feb 04, 2025
Viaarxiv icon

Soup to go: mitigating forgetting during continual learning with model averaging

Add code
Jan 09, 2025
Viaarxiv icon

From an Image to a Scene: Learning to Imagine the World from a Million 360 Videos

Add code
Dec 10, 2024
Viaarxiv icon

Mind the Gap: Examining the Self-Improvement Capabilities of Large Language Models

Add code
Dec 03, 2024
Viaarxiv icon

Loss-to-Loss Prediction: Scaling Laws for All Datasets

Add code
Nov 19, 2024
Figure 1 for Loss-to-Loss Prediction: Scaling Laws for All Datasets
Figure 2 for Loss-to-Loss Prediction: Scaling Laws for All Datasets
Figure 3 for Loss-to-Loss Prediction: Scaling Laws for All Datasets
Figure 4 for Loss-to-Loss Prediction: Scaling Laws for All Datasets
Viaarxiv icon

How Does Critical Batch Size Scale in Pre-training?

Add code
Oct 29, 2024
Figure 1 for How Does Critical Batch Size Scale in Pre-training?
Figure 2 for How Does Critical Batch Size Scale in Pre-training?
Figure 3 for How Does Critical Batch Size Scale in Pre-training?
Figure 4 for How Does Critical Batch Size Scale in Pre-training?
Viaarxiv icon

LoRA Soups: Merging LoRAs for Practical Skill Composition Tasks

Add code
Oct 16, 2024
Figure 1 for LoRA Soups: Merging LoRAs for Practical Skill Composition Tasks
Figure 2 for LoRA Soups: Merging LoRAs for Practical Skill Composition Tasks
Figure 3 for LoRA Soups: Merging LoRAs for Practical Skill Composition Tasks
Figure 4 for LoRA Soups: Merging LoRAs for Practical Skill Composition Tasks
Viaarxiv icon