Picture for Stan van Wingerden

Stan van Wingerden

Differentiation and Specialization of Attention Heads via the Refined Local Learning Coefficient

Add code
Oct 03, 2024
Viaarxiv icon