Picture for Tokio Kajitsuka

Tokio Kajitsuka

Optimal Memorization Capacity of Transformers

Add code
Sep 26, 2024
Viaarxiv icon

Are Transformers with One Layer Self-Attention Using Low-Rank Weight Matrices Universal Approximators?

Add code
Jul 26, 2023
Viaarxiv icon