Picture for Sham Kakade

Sham Kakade

How Does Critical Batch Size Scale in Pre-training?

Add code
Oct 29, 2024
Viaarxiv icon

LoRA Soups: Merging LoRAs for Practical Skill Composition Tasks

Add code
Oct 16, 2024
Figure 1 for LoRA Soups: Merging LoRAs for Practical Skill Composition Tasks
Figure 2 for LoRA Soups: Merging LoRAs for Practical Skill Composition Tasks
Figure 3 for LoRA Soups: Merging LoRAs for Practical Skill Composition Tasks
Figure 4 for LoRA Soups: Merging LoRAs for Practical Skill Composition Tasks
Viaarxiv icon

Flash Inference: Near Linear Time Inference for Long Convolution Sequence Models and Beyond

Add code
Oct 16, 2024
Viaarxiv icon

SOAP: Improving and Stabilizing Shampoo using Adam

Add code
Sep 17, 2024
Viaarxiv icon

Deconstructing What Makes a Good Optimizer for Language Models

Add code
Jul 10, 2024
Viaarxiv icon

Universal Length Generalization with Turing Programs

Add code
Jul 03, 2024
Viaarxiv icon

A New Perspective on Shampoo's Preconditioner

Add code
Jun 25, 2024
Viaarxiv icon

DataComp-LM: In search of the next generation of training sets for language models

Add code
Jun 18, 2024
Figure 1 for DataComp-LM: In search of the next generation of training sets for language models
Figure 2 for DataComp-LM: In search of the next generation of training sets for language models
Figure 3 for DataComp-LM: In search of the next generation of training sets for language models
Figure 4 for DataComp-LM: In search of the next generation of training sets for language models
Viaarxiv icon

CoLoR-Filter: Conditional Loss Reduction Filtering for Targeted Language Model Pre-training

Add code
Jun 15, 2024
Viaarxiv icon

Follow My Instruction and Spill the Beans: Scalable Data Extraction from Retrieval-Augmented Generation Systems

Add code
Feb 27, 2024
Viaarxiv icon