Picture for Zihang Dai

Zihang Dai

Transformer Quality in Linear Time

Add code
Feb 21, 2022
Figure 1 for Transformer Quality in Linear Time
Figure 2 for Transformer Quality in Linear Time
Figure 3 for Transformer Quality in Linear Time
Figure 4 for Transformer Quality in Linear Time
Viaarxiv icon

Combined Scaling for Zero-shot Transfer Learning

Add code
Nov 19, 2021
Figure 1 for Combined Scaling for Zero-shot Transfer Learning
Figure 2 for Combined Scaling for Zero-shot Transfer Learning
Figure 3 for Combined Scaling for Zero-shot Transfer Learning
Figure 4 for Combined Scaling for Zero-shot Transfer Learning
Viaarxiv icon

Primer: Searching for Efficient Transformers for Language Modeling

Add code
Sep 17, 2021
Figure 1 for Primer: Searching for Efficient Transformers for Language Modeling
Figure 2 for Primer: Searching for Efficient Transformers for Language Modeling
Figure 3 for Primer: Searching for Efficient Transformers for Language Modeling
Figure 4 for Primer: Searching for Efficient Transformers for Language Modeling
Viaarxiv icon

SimVLM: Simple Visual Language Model Pretraining with Weak Supervision

Add code
Aug 24, 2021
Figure 1 for SimVLM: Simple Visual Language Model Pretraining with Weak Supervision
Figure 2 for SimVLM: Simple Visual Language Model Pretraining with Weak Supervision
Figure 3 for SimVLM: Simple Visual Language Model Pretraining with Weak Supervision
Figure 4 for SimVLM: Simple Visual Language Model Pretraining with Weak Supervision
Viaarxiv icon

Combiner: Full Attention Transformer with Sparse Computation Cost

Add code
Jul 12, 2021
Figure 1 for Combiner: Full Attention Transformer with Sparse Computation Cost
Figure 2 for Combiner: Full Attention Transformer with Sparse Computation Cost
Figure 3 for Combiner: Full Attention Transformer with Sparse Computation Cost
Figure 4 for Combiner: Full Attention Transformer with Sparse Computation Cost
Viaarxiv icon

CoAtNet: Marrying Convolution and Attention for All Data Sizes

Add code
Jun 09, 2021
Figure 1 for CoAtNet: Marrying Convolution and Attention for All Data Sizes
Figure 2 for CoAtNet: Marrying Convolution and Attention for All Data Sizes
Figure 3 for CoAtNet: Marrying Convolution and Attention for All Data Sizes
Figure 4 for CoAtNet: Marrying Convolution and Attention for All Data Sizes
Viaarxiv icon

Pay Attention to MLPs

Add code
Jun 01, 2021
Figure 1 for Pay Attention to MLPs
Figure 2 for Pay Attention to MLPs
Figure 3 for Pay Attention to MLPs
Figure 4 for Pay Attention to MLPs
Viaarxiv icon

Unsupervised Parallel Corpus Mining on Web Data

Add code
Sep 18, 2020
Figure 1 for Unsupervised Parallel Corpus Mining on Web Data
Figure 2 for Unsupervised Parallel Corpus Mining on Web Data
Figure 3 for Unsupervised Parallel Corpus Mining on Web Data
Figure 4 for Unsupervised Parallel Corpus Mining on Web Data
Viaarxiv icon

Funnel-Transformer: Filtering out Sequential Redundancy for Efficient Language Processing

Add code
Jun 05, 2020
Figure 1 for Funnel-Transformer: Filtering out Sequential Redundancy for Efficient Language Processing
Figure 2 for Funnel-Transformer: Filtering out Sequential Redundancy for Efficient Language Processing
Figure 3 for Funnel-Transformer: Filtering out Sequential Redundancy for Efficient Language Processing
Figure 4 for Funnel-Transformer: Filtering out Sequential Redundancy for Efficient Language Processing
Viaarxiv icon

Meta Pseudo Labels

Add code
Apr 23, 2020
Figure 1 for Meta Pseudo Labels
Figure 2 for Meta Pseudo Labels
Figure 3 for Meta Pseudo Labels
Figure 4 for Meta Pseudo Labels
Viaarxiv icon