Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Dyson Brownian motion and random matrix dynamics of weight matrices during learning

Nov 20, 2024

Gert Aarts, Ouraman Hajizadeh, Biagio Lucini, Chanju Park

Share this with someone who'll enjoy it:

Abstract:During training, weight matrices in machine learning architectures are updated using stochastic gradient descent or variations thereof. In this contribution we employ concepts of random matrix theory to analyse the resulting stochastic matrix dynamics. We first demonstrate that the dynamics can generically be described using Dyson Brownian motion, leading to e.g. eigenvalue repulsion. The level of stochasticity is shown to depend on the ratio of the learning rate and the mini-batch size, explaining the empirically observed linear scaling rule. We verify this linear scaling in the restricted Boltzmann machine. Subsequently we study weight matrix dynamics in transformers (a nano-GPT), following the evolution from a Marchenko-Pastur distribution for eigenvalues at initialisation to a combination with additional structure at the end of learning.

* 7 pages. Contribution accepted in the NeurIPS 2024 workshop "Machine Learning and the Physical Sciences"

View paper on

Share this with someone who'll enjoy it:

Title:Dyson Brownian motion and random matrix dynamics of weight matrices during learning

Paper and Code