Picture for Riccardo Rende

Riccardo Rende

A distributional simplicity bias in the learning dynamics of transformers

Add code
Oct 25, 2024
Viaarxiv icon

Are queries and keys always relevant? A case study on Transformer wave functions

Add code
May 29, 2024
Viaarxiv icon

Optimal inference of a generalised Potts model by single-layer transformers with factored attention

Add code
Apr 14, 2023
Viaarxiv icon