Picture for Nathan Godey

Nathan Godey

Why do small language models underperform? Studying Language Model Saturation via the Softmax Bottleneck

Add code
Apr 11, 2024
Viaarxiv icon

On the Scaling Laws of Geographical Representation in Language Models

Add code
Mar 04, 2024
Viaarxiv icon

Anisotropy Is Inherent to Self-Attention in Transformers

Add code
Jan 24, 2024
Viaarxiv icon

Headless Language Models: Learning without Predicting with Contrastive Weight Tying

Add code
Sep 15, 2023
Viaarxiv icon

Is Anisotropy Inherent to Transformers?

Add code
Jun 13, 2023
Viaarxiv icon

MANTa: Efficient Gradient-Based Tokenization for Robust End-to-End Language Modeling

Add code
Dec 14, 2022
Viaarxiv icon