Small-scale proxies for large-scale Transformer training instabilities

Add code
Sep 25, 2023
Figure 1 for Small-scale proxies for large-scale Transformer training instabilities
Figure 2 for Small-scale proxies for large-scale Transformer training instabilities
Figure 3 for Small-scale proxies for large-scale Transformer training instabilities
Figure 4 for Small-scale proxies for large-scale Transformer training instabilities

Share this with someone who'll enjoy it:

View paper onarxiv icon

Share this with someone who'll enjoy it: