Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Research without Re-search: Maximal Update Parametrization Yields Accurate Loss Prediction across Scales

Apr 29, 2023

Yiqun Yao, Yequan Wang

Figure 1 for Research without Re-search: Maximal Update Parametrization Yields Accurate Loss Prediction across Scales

Figure 2 for Research without Re-search: Maximal Update Parametrization Yields Accurate Loss Prediction across Scales

Figure 3 for Research without Re-search: Maximal Update Parametrization Yields Accurate Loss Prediction across Scales

Figure 4 for Research without Re-search: Maximal Update Parametrization Yields Accurate Loss Prediction across Scales

Share this with someone who'll enjoy it:

Abstract:As language models scale up, it becomes increasingly expensive to verify research ideas because conclusions on small models do not trivially transfer to large ones. A possible solution is to establish a generic system that directly predicts some metrics for large models solely based on the results and hyperparameters from small models. Existing methods based on scaling laws require hyperparameter search on the largest models, which is impractical with limited resources. We address this issue by presenting our discoveries indicating that Maximal Update parametrization (muP) enables accurate fitting of scaling laws for hyperparameters close to common loss basins, without any search. Thus, different models can be directly compared on large scales with loss prediction even before the training starts. We propose a new paradigm as a first step towards reliable academic research for any model scale without heavy computation. Code will be publicly available shortly.

* Updated figures and references

View paper on

Share this with someone who'll enjoy it:

Title:Research without Re-search: Maximal Update Parametrization Yields Accurate Loss Prediction across Scales

Paper and Code