Picture for Alberto Bietti

Alberto Bietti

How Truncating Weights Improves Reasoning in Language Models

Add code
Jun 05, 2024
Viaarxiv icon

Contextual Counting: A Mechanistic Study of Transformers on a Quantitative Task

Add code
May 30, 2024
Viaarxiv icon

Level Set Teleportation: An Optimization Perspective

Add code
Mar 05, 2024
Viaarxiv icon

Heavy-Tailed Class Imbalance and Why Adam Outperforms Gradient Descent on Language Models

Add code
Feb 29, 2024
Viaarxiv icon

Learning Associative Memories with Gradient Descent

Add code
Feb 28, 2024
Viaarxiv icon

On Learning Gaussian Multi-index Models with Gradient Flow

Add code
Nov 02, 2023
Viaarxiv icon

xVal: A Continuous Number Encoding for Large Language Models

Add code
Oct 04, 2023
Viaarxiv icon

AstroCLIP: Cross-Modal Pre-Training for Astronomical Foundation Models

Add code
Oct 04, 2023
Viaarxiv icon

Multiple Physics Pretraining for Physical Surrogate Models

Add code
Oct 04, 2023
Viaarxiv icon

Scaling Laws for Associative Memories

Add code
Oct 04, 2023
Viaarxiv icon