Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Is the Number of Trainable Parameters All That Actually Matters?

Sep 24, 2021

Amélie Chatelain, Amine Djeghri, Daniel Hesslow, Julien Launay, Iacopo Poli

Figure 1 for Is the Number of Trainable Parameters All That Actually Matters?

Figure 2 for Is the Number of Trainable Parameters All That Actually Matters?

Figure 3 for Is the Number of Trainable Parameters All That Actually Matters?

Figure 4 for Is the Number of Trainable Parameters All That Actually Matters?

Share this with someone who'll enjoy it:

Abstract:Recent work has identified simple empirical scaling laws for language models, linking compute budget, dataset size, model size, and autoregressive modeling loss. The validity of these simple power laws across orders of magnitude in model scale provides compelling evidence that larger models are also more capable models. However, scaling up models under the constraints of hardware and infrastructure is no easy feat, and rapidly becomes a hard and expensive engineering problem. We investigate ways to tentatively cheat scaling laws, and train larger models for cheaper. We emulate an increase in effective parameters, using efficient approximations: either by doping the models with frozen random parameters, or by using fast structured transforms in place of dense linear layers. We find that the scaling relationship between test loss and compute depends only on the actual number of trainable parameters; scaling laws cannot be deceived by spurious parameters.

View paper on

OpenReview

Share this with someone who'll enjoy it:

Title:Is the Number of Trainable Parameters All That Actually Matters?

Paper and Code