Picture for Matteo Pagliardini

Matteo Pagliardini

Thinking Slow, Fast: Scaling Inference Compute with Distilled Reasoners

Add code
Feb 27, 2025
Viaarxiv icon

Leveraging the true depth of LLMs

Add code
Feb 05, 2025
Viaarxiv icon

The AdEMAMix Optimizer: Better, Faster, Older

Add code
Sep 05, 2024
Figure 1 for The AdEMAMix Optimizer: Better, Faster, Older
Figure 2 for The AdEMAMix Optimizer: Better, Faster, Older
Figure 3 for The AdEMAMix Optimizer: Better, Faster, Older
Figure 4 for The AdEMAMix Optimizer: Better, Faster, Older
Viaarxiv icon

DenseFormer: Enhancing Information Flow in Transformers via Depth Weighted Averaging

Add code
Feb 04, 2024
Figure 1 for DenseFormer: Enhancing Information Flow in Transformers via Depth Weighted Averaging
Figure 2 for DenseFormer: Enhancing Information Flow in Transformers via Depth Weighted Averaging
Figure 3 for DenseFormer: Enhancing Information Flow in Transformers via Depth Weighted Averaging
Figure 4 for DenseFormer: Enhancing Information Flow in Transformers via Depth Weighted Averaging
Viaarxiv icon

MEDITRON-70B: Scaling Medical Pretraining for Large Language Models

Add code
Nov 27, 2023
Figure 1 for MEDITRON-70B: Scaling Medical Pretraining for Large Language Models
Figure 2 for MEDITRON-70B: Scaling Medical Pretraining for Large Language Models
Figure 3 for MEDITRON-70B: Scaling Medical Pretraining for Large Language Models
Figure 4 for MEDITRON-70B: Scaling Medical Pretraining for Large Language Models
Viaarxiv icon

DoGE: Domain Reweighting with Generalization Estimation

Add code
Oct 23, 2023
Viaarxiv icon

CoTFormer: More Tokens With Attention Make Up For Less Depth

Add code
Oct 16, 2023
Figure 1 for CoTFormer: More Tokens With Attention Make Up For Less Depth
Figure 2 for CoTFormer: More Tokens With Attention Make Up For Less Depth
Figure 3 for CoTFormer: More Tokens With Attention Make Up For Less Depth
Figure 4 for CoTFormer: More Tokens With Attention Make Up For Less Depth
Viaarxiv icon

Faster Causal Attention Over Large Sequences Through Sparse Flash Attention

Add code
Jun 01, 2023
Viaarxiv icon

Revisiting the ACVI Method for Constrained Variational Inequalities

Add code
Oct 27, 2022
Viaarxiv icon

Improving Generalization via Uncertainty Driven Perturbations

Add code
Feb 28, 2022
Figure 1 for Improving Generalization via Uncertainty Driven Perturbations
Figure 2 for Improving Generalization via Uncertainty Driven Perturbations
Figure 3 for Improving Generalization via Uncertainty Driven Perturbations
Figure 4 for Improving Generalization via Uncertainty Driven Perturbations
Viaarxiv icon