Picture for Libin Zhu

Libin Zhu

Emergence in non-neural models: grokking modular arithmetic via average gradient outer product

Add code
Jul 29, 2024
Viaarxiv icon

Catapults in SGD: spikes in the training loss and their impact on generalization through feature learning

Add code
Jun 07, 2023
Viaarxiv icon

Restricted Strong Convexity of Deep Learning Models with Smooth Activations

Add code
Sep 29, 2022
Figure 1 for Restricted Strong Convexity of Deep Learning Models with Smooth Activations
Figure 2 for Restricted Strong Convexity of Deep Learning Models with Smooth Activations
Viaarxiv icon

A note on Linear Bottleneck networks and their Transition to Multilinearity

Add code
Jun 30, 2022
Figure 1 for A note on Linear Bottleneck networks and their Transition to Multilinearity
Figure 2 for A note on Linear Bottleneck networks and their Transition to Multilinearity
Viaarxiv icon

Quadratic models for understanding neural network dynamics

Add code
May 24, 2022
Figure 1 for Quadratic models for understanding neural network dynamics
Figure 2 for Quadratic models for understanding neural network dynamics
Figure 3 for Quadratic models for understanding neural network dynamics
Figure 4 for Quadratic models for understanding neural network dynamics
Viaarxiv icon

Transition to Linearity of General Neural Networks with Directed Acyclic Graph Architecture

Add code
May 24, 2022
Figure 1 for Transition to Linearity of General Neural Networks with Directed Acyclic Graph Architecture
Viaarxiv icon

Transition to Linearity of Wide Neural Networks is an Emerging Property of Assembling Weak Models

Add code
Mar 10, 2022
Figure 1 for Transition to Linearity of Wide Neural Networks is an Emerging Property of Assembling Weak Models
Figure 2 for Transition to Linearity of Wide Neural Networks is an Emerging Property of Assembling Weak Models
Figure 3 for Transition to Linearity of Wide Neural Networks is an Emerging Property of Assembling Weak Models
Viaarxiv icon

On the linearity of large non-linear models: when and why the tangent kernel is constant

Add code
Oct 02, 2020
Figure 1 for On the linearity of large non-linear models: when and why the tangent kernel is constant
Figure 2 for On the linearity of large non-linear models: when and why the tangent kernel is constant
Figure 3 for On the linearity of large non-linear models: when and why the tangent kernel is constant
Viaarxiv icon

Toward a theory of optimization for over-parameterized systems of non-linear equations: the lessons of deep learning

Add code
Feb 29, 2020
Figure 1 for Toward a theory of optimization for over-parameterized systems of non-linear equations: the lessons of deep learning
Figure 2 for Toward a theory of optimization for over-parameterized systems of non-linear equations: the lessons of deep learning
Figure 3 for Toward a theory of optimization for over-parameterized systems of non-linear equations: the lessons of deep learning
Viaarxiv icon