Picture for Atli Kosson

Atli Kosson

Analyzing & Reducing the Need for Learning Rate Warmup in GPT Training

Add code
Oct 31, 2024
Viaarxiv icon

Scaling Laws and Compute-Optimal Training Beyond Fixed Training Durations

Add code
May 29, 2024
Viaarxiv icon

Memory Efficient Mixed-Precision Optimizers

Add code
Sep 21, 2023
Viaarxiv icon

Rotational Optimizers: Simple & Robust DNN Training

Add code
May 26, 2023
Viaarxiv icon

Ghost Noise for Regularizing Deep Neural Networks

Add code
May 26, 2023
Viaarxiv icon

Hardware-Efficient Transformer Training via Piecewise Affine Operations

Add code
May 26, 2023
Viaarxiv icon

Adaptive Braking for Mitigating Gradient Delay

Add code
Jul 10, 2020
Figure 1 for Adaptive Braking for Mitigating Gradient Delay
Figure 2 for Adaptive Braking for Mitigating Gradient Delay
Figure 3 for Adaptive Braking for Mitigating Gradient Delay
Figure 4 for Adaptive Braking for Mitigating Gradient Delay
Viaarxiv icon

Pipelined Backpropagation at Scale: Training Large Models without Batches

Add code
Mar 25, 2020
Figure 1 for Pipelined Backpropagation at Scale: Training Large Models without Batches
Figure 2 for Pipelined Backpropagation at Scale: Training Large Models without Batches
Figure 3 for Pipelined Backpropagation at Scale: Training Large Models without Batches
Figure 4 for Pipelined Backpropagation at Scale: Training Large Models without Batches
Viaarxiv icon

Online Normalization for Training Neural Networks

Add code
May 28, 2019
Figure 1 for Online Normalization for Training Neural Networks
Figure 2 for Online Normalization for Training Neural Networks
Figure 3 for Online Normalization for Training Neural Networks
Figure 4 for Online Normalization for Training Neural Networks
Viaarxiv icon