Deep Transformers without Shortcuts: Modifying Self-attention for Faithful Signal Propagation

Add code
Feb 20, 2023
Figure 1 for Deep Transformers without Shortcuts: Modifying Self-attention for Faithful Signal Propagation
Figure 2 for Deep Transformers without Shortcuts: Modifying Self-attention for Faithful Signal Propagation
Figure 3 for Deep Transformers without Shortcuts: Modifying Self-attention for Faithful Signal Propagation
Figure 4 for Deep Transformers without Shortcuts: Modifying Self-attention for Faithful Signal Propagation

Share this with someone who'll enjoy it:

View paper onarxiv iconopen_review iconOpenReview

Share this with someone who'll enjoy it: