Picture for Felix Dangel

Felix Dangel

What Does It Mean to Be a Transformer? Insights from a Theoretical Hessian Analysis

Add code
Oct 14, 2024
Viaarxiv icon

Revisiting Scalable Hessian Diagonal Approximations for Applications in Reinforcement Learning

Add code
Jun 05, 2024
Viaarxiv icon

Lowering PyTorch's Memory Consumption for Selective Differentiation

Add code
Apr 15, 2024
Figure 1 for Lowering PyTorch's Memory Consumption for Selective Differentiation
Figure 2 for Lowering PyTorch's Memory Consumption for Selective Differentiation
Figure 3 for Lowering PyTorch's Memory Consumption for Selective Differentiation
Figure 4 for Lowering PyTorch's Memory Consumption for Selective Differentiation
Viaarxiv icon

Can We Remove the Square-Root in Adaptive Gradient Methods? A Second-Order Perspective

Add code
Feb 13, 2024
Figure 1 for Can We Remove the Square-Root in Adaptive Gradient Methods? A Second-Order Perspective
Figure 2 for Can We Remove the Square-Root in Adaptive Gradient Methods? A Second-Order Perspective
Figure 3 for Can We Remove the Square-Root in Adaptive Gradient Methods? A Second-Order Perspective
Figure 4 for Can We Remove the Square-Root in Adaptive Gradient Methods? A Second-Order Perspective
Viaarxiv icon

Structured Inverse-Free Natural Gradient: Memory-Efficient & Numerically-Stable KFAC for Large Neural Nets

Add code
Dec 16, 2023
Viaarxiv icon

On the Disconnect Between Theory and Practice of Overparametrized Neural Networks

Add code
Sep 29, 2023
Viaarxiv icon

Convolutions Through the Lens of Tensor Networks

Add code
Jul 05, 2023
Viaarxiv icon

The Geometry of Neural Nets' Parameter Spaces Under Reparametrization

Add code
Feb 14, 2023
Viaarxiv icon

ViViT: Curvature access through the generalized Gauss-Newton's low-rank structure

Add code
Jun 04, 2021
Figure 1 for ViViT: Curvature access through the generalized Gauss-Newton's low-rank structure
Figure 2 for ViViT: Curvature access through the generalized Gauss-Newton's low-rank structure
Figure 3 for ViViT: Curvature access through the generalized Gauss-Newton's low-rank structure
Viaarxiv icon

Cockpit: A Practical Debugging Tool for Training Deep Neural Networks

Add code
Feb 12, 2021
Figure 1 for Cockpit: A Practical Debugging Tool for Training Deep Neural Networks
Figure 2 for Cockpit: A Practical Debugging Tool for Training Deep Neural Networks
Figure 3 for Cockpit: A Practical Debugging Tool for Training Deep Neural Networks
Figure 4 for Cockpit: A Practical Debugging Tool for Training Deep Neural Networks
Viaarxiv icon