Picture for Michael E. Sander

Michael E. Sander

Towards Understanding the Universality of Transformers for Next-Token Prediction

Add code
Oct 03, 2024
Figure 1 for Towards Understanding the Universality of Transformers for Next-Token Prediction
Figure 2 for Towards Understanding the Universality of Transformers for Next-Token Prediction
Figure 3 for Towards Understanding the Universality of Transformers for Next-Token Prediction
Figure 4 for Towards Understanding the Universality of Transformers for Next-Token Prediction
Viaarxiv icon

How do Transformers perform In-Context Autoregressive Learning?

Add code
Feb 08, 2024
Figure 1 for How do Transformers perform In-Context Autoregressive Learning?
Figure 2 for How do Transformers perform In-Context Autoregressive Learning?
Figure 3 for How do Transformers perform In-Context Autoregressive Learning?
Figure 4 for How do Transformers perform In-Context Autoregressive Learning?
Viaarxiv icon

Implicit regularization of deep residual networks towards neural ODEs

Add code
Sep 03, 2023
Figure 1 for Implicit regularization of deep residual networks towards neural ODEs
Figure 2 for Implicit regularization of deep residual networks towards neural ODEs
Figure 3 for Implicit regularization of deep residual networks towards neural ODEs
Figure 4 for Implicit regularization of deep residual networks towards neural ODEs
Viaarxiv icon

Fast, Differentiable and Sparse Top-k: a Convex Analysis Perspective

Add code
Feb 06, 2023
Figure 1 for Fast, Differentiable and Sparse Top-k: a Convex Analysis Perspective
Figure 2 for Fast, Differentiable and Sparse Top-k: a Convex Analysis Perspective
Figure 3 for Fast, Differentiable and Sparse Top-k: a Convex Analysis Perspective
Figure 4 for Fast, Differentiable and Sparse Top-k: a Convex Analysis Perspective
Viaarxiv icon

Vision Transformers provably learn spatial structure

Add code
Oct 13, 2022
Figure 1 for Vision Transformers provably learn spatial structure
Figure 2 for Vision Transformers provably learn spatial structure
Figure 3 for Vision Transformers provably learn spatial structure
Figure 4 for Vision Transformers provably learn spatial structure
Viaarxiv icon

Do Residual Neural Networks discretize Neural Ordinary Differential Equations?

Add code
May 29, 2022
Figure 1 for Do Residual Neural Networks discretize Neural Ordinary Differential Equations?
Figure 2 for Do Residual Neural Networks discretize Neural Ordinary Differential Equations?
Figure 3 for Do Residual Neural Networks discretize Neural Ordinary Differential Equations?
Figure 4 for Do Residual Neural Networks discretize Neural Ordinary Differential Equations?
Viaarxiv icon

Sinkformers: Transformers with Doubly Stochastic Attention

Add code
Oct 22, 2021
Figure 1 for Sinkformers: Transformers with Doubly Stochastic Attention
Figure 2 for Sinkformers: Transformers with Doubly Stochastic Attention
Figure 3 for Sinkformers: Transformers with Doubly Stochastic Attention
Figure 4 for Sinkformers: Transformers with Doubly Stochastic Attention
Viaarxiv icon

Momentum Residual Neural Networks

Add code
Feb 15, 2021
Figure 1 for Momentum Residual Neural Networks
Figure 2 for Momentum Residual Neural Networks
Figure 3 for Momentum Residual Neural Networks
Figure 4 for Momentum Residual Neural Networks
Viaarxiv icon