Picture for Samy Jelassi

Samy Jelassi

DMA, CIMS

Collective Model Intelligence Requires Compatible Specialization

Add code
Nov 04, 2024
Viaarxiv icon

Mixture of Parrots: Experts improve memorization more than reasoning

Add code
Oct 24, 2024
Figure 1 for Mixture of Parrots: Experts improve memorization more than reasoning
Figure 2 for Mixture of Parrots: Experts improve memorization more than reasoning
Figure 3 for Mixture of Parrots: Experts improve memorization more than reasoning
Figure 4 for Mixture of Parrots: Experts improve memorization more than reasoning
Viaarxiv icon

LoRA Soups: Merging LoRAs for Practical Skill Composition Tasks

Add code
Oct 16, 2024
Figure 1 for LoRA Soups: Merging LoRAs for Practical Skill Composition Tasks
Figure 2 for LoRA Soups: Merging LoRAs for Practical Skill Composition Tasks
Figure 3 for LoRA Soups: Merging LoRAs for Practical Skill Composition Tasks
Figure 4 for LoRA Soups: Merging LoRAs for Practical Skill Composition Tasks
Viaarxiv icon

Universal Length Generalization with Turing Programs

Add code
Jul 03, 2024
Viaarxiv icon

How Does Overparameterization Affect Features?

Add code
Jul 01, 2024
Viaarxiv icon

Q-Probe: A Lightweight Approach to Reward Maximization for Language Models

Add code
Feb 22, 2024
Viaarxiv icon

Repeat After Me: Transformers are Better than State Space Models at Copying

Add code
Feb 01, 2024
Viaarxiv icon

Length Generalization in Arithmetic Transformers

Add code
Jun 27, 2023
Viaarxiv icon

Depth Dependence of $μ$P Learning Rates in ReLU MLPs

Add code
May 13, 2023
Viaarxiv icon

Vision Transformers provably learn spatial structure

Add code
Oct 13, 2022
Figure 1 for Vision Transformers provably learn spatial structure
Figure 2 for Vision Transformers provably learn spatial structure
Figure 3 for Vision Transformers provably learn spatial structure
Figure 4 for Vision Transformers provably learn spatial structure
Viaarxiv icon