Which transformer architecture fits my data? A vocabulary bottleneck in self-attention

Add code
May 09, 2021
Figure 1 for Which transformer architecture fits my data? A vocabulary bottleneck in self-attention
Figure 2 for Which transformer architecture fits my data? A vocabulary bottleneck in self-attention
Figure 3 for Which transformer architecture fits my data? A vocabulary bottleneck in self-attention
Figure 4 for Which transformer architecture fits my data? A vocabulary bottleneck in self-attention

Share this with someone who'll enjoy it:

View paper onarxiv icon

Share this with someone who'll enjoy it: