Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:When Can You Get Away with Low Memory Adam?

Mar 03, 2025

Dayal Singh Kalra, John Kirchenbauer, Maissam Barkeshli, Tom Goldstein

Figure 1 for When Can You Get Away with Low Memory Adam?

Figure 2 for When Can You Get Away with Low Memory Adam?

Figure 3 for When Can You Get Away with Low Memory Adam?

Figure 4 for When Can You Get Away with Low Memory Adam?

Share this with someone who'll enjoy it:

Abstract:Adam is the go-to optimizer for training modern machine learning models, but it requires additional memory to maintain the moving averages of the gradients and their squares. While various low-memory optimizers have been proposed that sometimes match the performance of Adam, their lack of reliability has left Adam as the default choice. In this work, we apply a simple layer-wise Signal-to-Noise Ratio (SNR) analysis to quantify when second-moment tensors can be effectively replaced by their means across different dimensions. Our SNR analysis reveals how architecture, training hyperparameters, and dataset properties impact compressibility along Adam's trajectory, naturally leading to $\textit{SlimAdam}$, a memory-efficient Adam variant. $\textit{SlimAdam}$ compresses the second moments along dimensions with high SNR when feasible, and leaves when compression would be detrimental. Through experiments across a diverse set of architectures and training scenarios, we show that $\textit{SlimAdam}$ matches Adam's performance and stability while saving up to $98\%$ of total second moments. Code for $\textit{SlimAdam}$ is available at https://github.com/dayal-kalra/low-memory-adam.

* 9+17 pages, 11+19 figures

View paper on

Share this with someone who'll enjoy it:

Title:When Can You Get Away with Low Memory Adam?

Paper and Code