Picture for Łukasz Kaiser

Łukasz Kaiser

tsGT: Stochastic Time Series Modeling With Transformer

Add code
Mar 15, 2024
Viaarxiv icon

Sparse is Enough in Scaling Transformers

Add code
Nov 24, 2021
Figure 1 for Sparse is Enough in Scaling Transformers
Figure 2 for Sparse is Enough in Scaling Transformers
Figure 3 for Sparse is Enough in Scaling Transformers
Figure 4 for Sparse is Enough in Scaling Transformers
Viaarxiv icon

Hierarchical Transformers Are More Efficient Language Models

Add code
Oct 26, 2021
Figure 1 for Hierarchical Transformers Are More Efficient Language Models
Figure 2 for Hierarchical Transformers Are More Efficient Language Models
Figure 3 for Hierarchical Transformers Are More Efficient Language Models
Figure 4 for Hierarchical Transformers Are More Efficient Language Models
Viaarxiv icon

Q-Value Weighted Regression: Reinforcement Learning with Limited Data

Add code
Feb 12, 2021
Figure 1 for Q-Value Weighted Regression: Reinforcement Learning with Limited Data
Figure 2 for Q-Value Weighted Regression: Reinforcement Learning with Limited Data
Figure 3 for Q-Value Weighted Regression: Reinforcement Learning with Limited Data
Figure 4 for Q-Value Weighted Regression: Reinforcement Learning with Limited Data
Viaarxiv icon

Reformer: The Efficient Transformer

Add code
Feb 18, 2020
Figure 1 for Reformer: The Efficient Transformer
Figure 2 for Reformer: The Efficient Transformer
Figure 3 for Reformer: The Efficient Transformer
Figure 4 for Reformer: The Efficient Transformer
Viaarxiv icon

Universal Transformers

Add code
Jul 10, 2018
Figure 1 for Universal Transformers
Figure 2 for Universal Transformers
Figure 3 for Universal Transformers
Figure 4 for Universal Transformers
Viaarxiv icon

Image Transformer

Add code
Jun 15, 2018
Figure 1 for Image Transformer
Figure 2 for Image Transformer
Figure 3 for Image Transformer
Figure 4 for Image Transformer
Viaarxiv icon

Fast Decoding in Sequence Models using Discrete Latent Variables

Add code
Jun 07, 2018
Figure 1 for Fast Decoding in Sequence Models using Discrete Latent Variables
Figure 2 for Fast Decoding in Sequence Models using Discrete Latent Variables
Figure 3 for Fast Decoding in Sequence Models using Discrete Latent Variables
Figure 4 for Fast Decoding in Sequence Models using Discrete Latent Variables
Viaarxiv icon

Tensor2Tensor for Neural Machine Translation

Add code
Mar 16, 2018
Figure 1 for Tensor2Tensor for Neural Machine Translation
Viaarxiv icon

Discrete Autoencoders for Sequence Models

Add code
Jan 29, 2018
Figure 1 for Discrete Autoencoders for Sequence Models
Figure 2 for Discrete Autoencoders for Sequence Models
Figure 3 for Discrete Autoencoders for Sequence Models
Figure 4 for Discrete Autoencoders for Sequence Models
Viaarxiv icon