Picture for Mitchell Wortsman

Mitchell Wortsman

Scaling Exponents Across Parameterizations and Optimizers

Add code
Jul 08, 2024
Figure 1 for Scaling Exponents Across Parameterizations and Optimizers
Figure 2 for Scaling Exponents Across Parameterizations and Optimizers
Figure 3 for Scaling Exponents Across Parameterizations and Optimizers
Figure 4 for Scaling Exponents Across Parameterizations and Optimizers
Viaarxiv icon

Resolving Discrepancies in Compute-Optimal Scaling of Language Models

Add code
Jun 27, 2024
Figure 1 for Resolving Discrepancies in Compute-Optimal Scaling of Language Models
Figure 2 for Resolving Discrepancies in Compute-Optimal Scaling of Language Models
Figure 3 for Resolving Discrepancies in Compute-Optimal Scaling of Language Models
Figure 4 for Resolving Discrepancies in Compute-Optimal Scaling of Language Models
Viaarxiv icon

DataComp-LM: In search of the next generation of training sets for language models

Add code
Jun 18, 2024
Figure 1 for DataComp-LM: In search of the next generation of training sets for language models
Figure 2 for DataComp-LM: In search of the next generation of training sets for language models
Figure 3 for DataComp-LM: In search of the next generation of training sets for language models
Figure 4 for DataComp-LM: In search of the next generation of training sets for language models
Viaarxiv icon

Language models scale reliably with over-training and on downstream tasks

Add code
Mar 13, 2024
Figure 1 for Language models scale reliably with over-training and on downstream tasks
Figure 2 for Language models scale reliably with over-training and on downstream tasks
Figure 3 for Language models scale reliably with over-training and on downstream tasks
Figure 4 for Language models scale reliably with over-training and on downstream tasks
Viaarxiv icon

OLMo: Accelerating the Science of Language Models

Add code
Feb 07, 2024
Figure 1 for OLMo: Accelerating the Science of Language Models
Figure 2 for OLMo: Accelerating the Science of Language Models
Figure 3 for OLMo: Accelerating the Science of Language Models
Figure 4 for OLMo: Accelerating the Science of Language Models
Viaarxiv icon

Small-scale proxies for large-scale Transformer training instabilities

Add code
Sep 25, 2023
Figure 1 for Small-scale proxies for large-scale Transformer training instabilities
Figure 2 for Small-scale proxies for large-scale Transformer training instabilities
Figure 3 for Small-scale proxies for large-scale Transformer training instabilities
Figure 4 for Small-scale proxies for large-scale Transformer training instabilities
Viaarxiv icon

Replacing softmax with ReLU in Vision Transformers

Add code
Sep 15, 2023
Figure 1 for Replacing softmax with ReLU in Vision Transformers
Figure 2 for Replacing softmax with ReLU in Vision Transformers
Figure 3 for Replacing softmax with ReLU in Vision Transformers
Figure 4 for Replacing softmax with ReLU in Vision Transformers
Viaarxiv icon

OpenFlamingo: An Open-Source Framework for Training Large Autoregressive Vision-Language Models

Add code
Aug 07, 2023
Viaarxiv icon

DataComp: In search of the next generation of multimodal datasets

Add code
May 03, 2023
Viaarxiv icon

Stable and low-precision training for large-scale vision-language models

Add code
Apr 25, 2023
Figure 1 for Stable and low-precision training for large-scale vision-language models
Figure 2 for Stable and low-precision training for large-scale vision-language models
Figure 3 for Stable and low-precision training for large-scale vision-language models
Figure 4 for Stable and low-precision training for large-scale vision-language models
Viaarxiv icon