Picture for Atli Kosson

Atli Kosson

Analyzing & Reducing the Need for Learning Rate Warmup in GPT Training

Add code
Oct 31, 2024
Viaarxiv icon

Scaling Laws and Compute-Optimal Training Beyond Fixed Training Durations

Add code
May 29, 2024
Viaarxiv icon

Memory Efficient Mixed-Precision Optimizers

Add code
Sep 21, 2023
Viaarxiv icon

Rotational Optimizers: Simple & Robust DNN Training

Add code
May 26, 2023
Viaarxiv icon

Hardware-Efficient Transformer Training via Piecewise Affine Operations

Add code
May 26, 2023
Viaarxiv icon

Ghost Noise for Regularizing Deep Neural Networks

Add code
May 26, 2023
Viaarxiv icon

Adaptive Braking for Mitigating Gradient Delay

Add code
Jul 10, 2020
Figure 1 for Adaptive Braking for Mitigating Gradient Delay
Figure 2 for Adaptive Braking for Mitigating Gradient Delay
Figure 3 for Adaptive Braking for Mitigating Gradient Delay
Figure 4 for Adaptive Braking for Mitigating Gradient Delay
Viaarxiv icon

Pipelined Backpropagation at Scale: Training Large Models without Batches

Add code
Mar 25, 2020
Figure 1 for Pipelined Backpropagation at Scale: Training Large Models without Batches
Figure 2 for Pipelined Backpropagation at Scale: Training Large Models without Batches
Figure 3 for Pipelined Backpropagation at Scale: Training Large Models without Batches
Figure 4 for Pipelined Backpropagation at Scale: Training Large Models without Batches
Viaarxiv icon

Online Normalization for Training Neural Networks

Add code
May 28, 2019
Figure 1 for Online Normalization for Training Neural Networks
Figure 2 for Online Normalization for Training Neural Networks
Figure 3 for Online Normalization for Training Neural Networks
Figure 4 for Online Normalization for Training Neural Networks
Viaarxiv icon