Picture for Felix Dangel

Felix Dangel

Spectral-factorized Positive-definite Curvature Learning for NN Training

Add code
Feb 10, 2025
Viaarxiv icon

Position: Curvature Matrices Should Be Democratized via Linear Operators

Add code
Jan 31, 2025
Figure 1 for Position: Curvature Matrices Should Be Democratized via Linear Operators
Figure 2 for Position: Curvature Matrices Should Be Democratized via Linear Operators
Figure 3 for Position: Curvature Matrices Should Be Democratized via Linear Operators
Figure 4 for Position: Curvature Matrices Should Be Democratized via Linear Operators
Viaarxiv icon

What Does It Mean to Be a Transformer? Insights from a Theoretical Hessian Analysis

Add code
Oct 14, 2024
Figure 1 for What Does It Mean to Be a Transformer? Insights from a Theoretical Hessian Analysis
Figure 2 for What Does It Mean to Be a Transformer? Insights from a Theoretical Hessian Analysis
Figure 3 for What Does It Mean to Be a Transformer? Insights from a Theoretical Hessian Analysis
Figure 4 for What Does It Mean to Be a Transformer? Insights from a Theoretical Hessian Analysis
Viaarxiv icon

Revisiting Scalable Hessian Diagonal Approximations for Applications in Reinforcement Learning

Add code
Jun 05, 2024
Viaarxiv icon

Lowering PyTorch's Memory Consumption for Selective Differentiation

Add code
Apr 15, 2024
Figure 1 for Lowering PyTorch's Memory Consumption for Selective Differentiation
Figure 2 for Lowering PyTorch's Memory Consumption for Selective Differentiation
Figure 3 for Lowering PyTorch's Memory Consumption for Selective Differentiation
Figure 4 for Lowering PyTorch's Memory Consumption for Selective Differentiation
Viaarxiv icon

Can We Remove the Square-Root in Adaptive Gradient Methods? A Second-Order Perspective

Add code
Feb 13, 2024
Figure 1 for Can We Remove the Square-Root in Adaptive Gradient Methods? A Second-Order Perspective
Figure 2 for Can We Remove the Square-Root in Adaptive Gradient Methods? A Second-Order Perspective
Figure 3 for Can We Remove the Square-Root in Adaptive Gradient Methods? A Second-Order Perspective
Figure 4 for Can We Remove the Square-Root in Adaptive Gradient Methods? A Second-Order Perspective
Viaarxiv icon

Structured Inverse-Free Natural Gradient: Memory-Efficient & Numerically-Stable KFAC for Large Neural Nets

Add code
Dec 16, 2023
Viaarxiv icon

On the Disconnect Between Theory and Practice of Overparametrized Neural Networks

Add code
Sep 29, 2023
Viaarxiv icon

Convolutions Through the Lens of Tensor Networks

Add code
Jul 05, 2023
Viaarxiv icon

The Geometry of Neural Nets' Parameter Spaces Under Reparametrization

Add code
Feb 14, 2023
Viaarxiv icon