Picture for Sam Shleifer

Sam Shleifer

Shammie

PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel

Add code
Apr 21, 2023
Viaarxiv icon

Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models

Add code
Jun 10, 2022
Viaarxiv icon

OPT: Open Pre-trained Transformer Language Models

Add code
May 05, 2022
Figure 1 for OPT: Open Pre-trained Transformer Language Models
Figure 2 for OPT: Open Pre-trained Transformer Language Models
Figure 3 for OPT: Open Pre-trained Transformer Language Models
Figure 4 for OPT: Open Pre-trained Transformer Language Models
Viaarxiv icon

Efficient Language Modeling with Sparse all-MLP

Add code
Mar 16, 2022
Figure 1 for Efficient Language Modeling with Sparse all-MLP
Figure 2 for Efficient Language Modeling with Sparse all-MLP
Figure 3 for Efficient Language Modeling with Sparse all-MLP
Figure 4 for Efficient Language Modeling with Sparse all-MLP
Viaarxiv icon

Efficient Large Scale Language Modeling with Mixtures of Experts

Add code
Dec 20, 2021
Figure 1 for Efficient Large Scale Language Modeling with Mixtures of Experts
Figure 2 for Efficient Large Scale Language Modeling with Mixtures of Experts
Figure 3 for Efficient Large Scale Language Modeling with Mixtures of Experts
Figure 4 for Efficient Large Scale Language Modeling with Mixtures of Experts
Viaarxiv icon

Few-shot Learning with Multilingual Language Models

Add code
Dec 20, 2021
Figure 1 for Few-shot Learning with Multilingual Language Models
Figure 2 for Few-shot Learning with Multilingual Language Models
Figure 3 for Few-shot Learning with Multilingual Language Models
Figure 4 for Few-shot Learning with Multilingual Language Models
Viaarxiv icon

NormFormer: Improved Transformer Pretraining with Extra Normalization

Add code
Nov 01, 2021
Figure 1 for NormFormer: Improved Transformer Pretraining with Extra Normalization
Figure 2 for NormFormer: Improved Transformer Pretraining with Extra Normalization
Figure 3 for NormFormer: Improved Transformer Pretraining with Extra Normalization
Figure 4 for NormFormer: Improved Transformer Pretraining with Extra Normalization
Viaarxiv icon

8-bit Optimizers via Block-wise Quantization

Add code
Oct 06, 2021
Figure 1 for 8-bit Optimizers via Block-wise Quantization
Figure 2 for 8-bit Optimizers via Block-wise Quantization
Figure 3 for 8-bit Optimizers via Block-wise Quantization
Figure 4 for 8-bit Optimizers via Block-wise Quantization
Viaarxiv icon

Pre-trained Summarization Distillation

Add code
Oct 28, 2020
Viaarxiv icon

Incrementally Improving Graph WaveNet Performance on Traffic Prediction

Add code
Dec 11, 2019
Figure 1 for Incrementally Improving Graph WaveNet Performance on Traffic Prediction
Figure 2 for Incrementally Improving Graph WaveNet Performance on Traffic Prediction
Figure 3 for Incrementally Improving Graph WaveNet Performance on Traffic Prediction
Figure 4 for Incrementally Improving Graph WaveNet Performance on Traffic Prediction
Viaarxiv icon