Picture for Yassir Akram

Yassir Akram

Weight decay induces low-rank attention layers

Add code
Oct 31, 2024
Figure 1 for Weight decay induces low-rank attention layers
Figure 2 for Weight decay induces low-rank attention layers
Figure 3 for Weight decay induces low-rank attention layers
Figure 4 for Weight decay induces low-rank attention layers
Viaarxiv icon

Learning Randomized Algorithms with Transformers

Add code
Aug 20, 2024
Viaarxiv icon

When can transformers compositionally generalize in-context?

Add code
Jul 17, 2024
Viaarxiv icon

Attention as a Hypernetwork

Add code
Jun 09, 2024
Viaarxiv icon

Discovering modular solutions that generalize compositionally

Add code
Dec 22, 2023
Viaarxiv icon

Gated recurrent neural networks discover attention

Add code
Sep 04, 2023
Viaarxiv icon

Random initialisations performing above chance and how to find them

Add code
Sep 15, 2022
Figure 1 for Random initialisations performing above chance and how to find them
Figure 2 for Random initialisations performing above chance and how to find them
Figure 3 for Random initialisations performing above chance and how to find them
Figure 4 for Random initialisations performing above chance and how to find them
Viaarxiv icon