Picture for Eran Malach

Eran Malach

Loss-to-Loss Prediction: Scaling Laws for All Datasets

Add code
Nov 19, 2024
Figure 1 for Loss-to-Loss Prediction: Scaling Laws for All Datasets
Figure 2 for Loss-to-Loss Prediction: Scaling Laws for All Datasets
Figure 3 for Loss-to-Loss Prediction: Scaling Laws for All Datasets
Figure 4 for Loss-to-Loss Prediction: Scaling Laws for All Datasets
Viaarxiv icon

Mixture of Parrots: Experts improve memorization more than reasoning

Add code
Oct 24, 2024
Figure 1 for Mixture of Parrots: Experts improve memorization more than reasoning
Figure 2 for Mixture of Parrots: Experts improve memorization more than reasoning
Figure 3 for Mixture of Parrots: Experts improve memorization more than reasoning
Figure 4 for Mixture of Parrots: Experts improve memorization more than reasoning
Viaarxiv icon

LoRA Soups: Merging LoRAs for Practical Skill Composition Tasks

Add code
Oct 16, 2024
Figure 1 for LoRA Soups: Merging LoRAs for Practical Skill Composition Tasks
Figure 2 for LoRA Soups: Merging LoRAs for Practical Skill Composition Tasks
Figure 3 for LoRA Soups: Merging LoRAs for Practical Skill Composition Tasks
Figure 4 for LoRA Soups: Merging LoRAs for Practical Skill Composition Tasks
Viaarxiv icon

Don't Stop Me Now: Embedding Based Scheduling for LLMs

Add code
Oct 01, 2024
Figure 1 for Don't Stop Me Now: Embedding Based Scheduling for LLMs
Figure 2 for Don't Stop Me Now: Embedding Based Scheduling for LLMs
Figure 3 for Don't Stop Me Now: Embedding Based Scheduling for LLMs
Figure 4 for Don't Stop Me Now: Embedding Based Scheduling for LLMs
Viaarxiv icon

On the Power of Decision Trees in Auto-Regressive Language Modeling

Add code
Sep 27, 2024
Figure 1 for On the Power of Decision Trees in Auto-Regressive Language Modeling
Figure 2 for On the Power of Decision Trees in Auto-Regressive Language Modeling
Figure 3 for On the Power of Decision Trees in Auto-Regressive Language Modeling
Figure 4 for On the Power of Decision Trees in Auto-Regressive Language Modeling
Viaarxiv icon

Universal Length Generalization with Turing Programs

Add code
Jul 03, 2024
Viaarxiv icon

A New Perspective on Shampoo's Preconditioner

Add code
Jun 25, 2024
Viaarxiv icon

Transcendence: Generative Models Can Outperform The Experts That Train Them

Add code
Jun 17, 2024
Viaarxiv icon

The Evolution of Statistical Induction Heads: In-Context Learning Markov Chains

Add code
Feb 16, 2024
Viaarxiv icon

Repeat After Me: Transformers are Better than State Space Models at Copying

Add code
Feb 01, 2024
Viaarxiv icon