Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Naman Agrawal

VeLO: Training Versatile Learned Optimizers by Scaling Up

Nov 17, 2022

Luke Metz, James Harrison, C. Daniel Freeman, Amil Merchant, Lucas Beyer, James Bradbury, Naman Agrawal, Ben Poole, Igor Mordatch, Adam Roberts(+1 more)

Figure 1 for VeLO: Training Versatile Learned Optimizers by Scaling Up

Figure 2 for VeLO: Training Versatile Learned Optimizers by Scaling Up

Figure 3 for VeLO: Training Versatile Learned Optimizers by Scaling Up

Figure 4 for VeLO: Training Versatile Learned Optimizers by Scaling Up

Abstract:While deep learning models have replaced hand-designed features across many domains, these models are still trained with hand-designed optimizers. In this work, we leverage the same scaling approach behind the success of deep learning to learn versatile optimizers. We train an optimizer for deep learning which is itself a small neural network that ingests gradients and outputs parameter updates. Meta-trained with approximately four thousand TPU-months of compute on a wide variety of optimization tasks, our optimizer not only exhibits compelling performance, but optimizes in interesting and unexpected ways. It requires no hyperparameter tuning, instead automatically adapting to the specifics of the problem being optimized. We open source our learned optimizer, meta-training code, the associated train and test data, and an extensive optimizer benchmark suite with baselines at velo-code.github.io.

Via

Access Paper or Ask Questions