Picture for Matteo Pagliardini

Matteo Pagliardini

The AdEMAMix Optimizer: Better, Faster, Older

Add code
Sep 05, 2024
Figure 1 for The AdEMAMix Optimizer: Better, Faster, Older
Figure 2 for The AdEMAMix Optimizer: Better, Faster, Older
Figure 3 for The AdEMAMix Optimizer: Better, Faster, Older
Figure 4 for The AdEMAMix Optimizer: Better, Faster, Older
Viaarxiv icon

DenseFormer: Enhancing Information Flow in Transformers via Depth Weighted Averaging

Add code
Feb 04, 2024
Figure 1 for DenseFormer: Enhancing Information Flow in Transformers via Depth Weighted Averaging
Figure 2 for DenseFormer: Enhancing Information Flow in Transformers via Depth Weighted Averaging
Figure 3 for DenseFormer: Enhancing Information Flow in Transformers via Depth Weighted Averaging
Figure 4 for DenseFormer: Enhancing Information Flow in Transformers via Depth Weighted Averaging
Viaarxiv icon

MEDITRON-70B: Scaling Medical Pretraining for Large Language Models

Add code
Nov 27, 2023
Figure 1 for MEDITRON-70B: Scaling Medical Pretraining for Large Language Models
Figure 2 for MEDITRON-70B: Scaling Medical Pretraining for Large Language Models
Figure 3 for MEDITRON-70B: Scaling Medical Pretraining for Large Language Models
Figure 4 for MEDITRON-70B: Scaling Medical Pretraining for Large Language Models
Viaarxiv icon

DoGE: Domain Reweighting with Generalization Estimation

Add code
Oct 23, 2023
Viaarxiv icon

CoTFormer: More Tokens With Attention Make Up For Less Depth

Add code
Oct 16, 2023
Figure 1 for CoTFormer: More Tokens With Attention Make Up For Less Depth
Figure 2 for CoTFormer: More Tokens With Attention Make Up For Less Depth
Figure 3 for CoTFormer: More Tokens With Attention Make Up For Less Depth
Figure 4 for CoTFormer: More Tokens With Attention Make Up For Less Depth
Viaarxiv icon

Faster Causal Attention Over Large Sequences Through Sparse Flash Attention

Add code
Jun 01, 2023
Viaarxiv icon

Revisiting the ACVI Method for Constrained Variational Inequalities

Add code
Oct 27, 2022
Viaarxiv icon

Improving Generalization via Uncertainty Driven Perturbations

Add code
Feb 28, 2022
Figure 1 for Improving Generalization via Uncertainty Driven Perturbations
Figure 2 for Improving Generalization via Uncertainty Driven Perturbations
Figure 3 for Improving Generalization via Uncertainty Driven Perturbations
Figure 4 for Improving Generalization via Uncertainty Driven Perturbations
Viaarxiv icon

Agree to Disagree: Diversity through Disagreement for Better Transferability

Add code
Feb 09, 2022
Figure 1 for Agree to Disagree: Diversity through Disagreement for Better Transferability
Figure 2 for Agree to Disagree: Diversity through Disagreement for Better Transferability
Figure 3 for Agree to Disagree: Diversity through Disagreement for Better Transferability
Figure 4 for Agree to Disagree: Diversity through Disagreement for Better Transferability
Viaarxiv icon

The Peril of Popular Deep Learning Uncertainty Estimation Methods

Add code
Dec 09, 2021
Figure 1 for The Peril of Popular Deep Learning Uncertainty Estimation Methods
Figure 2 for The Peril of Popular Deep Learning Uncertainty Estimation Methods
Figure 3 for The Peril of Popular Deep Learning Uncertainty Estimation Methods
Figure 4 for The Peril of Popular Deep Learning Uncertainty Estimation Methods
Viaarxiv icon