Picture for Frederik Kunstner

Frederik Kunstner

Heavy-Tailed Class Imbalance and Why Adam Outperforms Gradient Descent on Language Models

Add code
Feb 29, 2024
Viaarxiv icon

Searching for Optimal Per-Coordinate Step-sizes with Multidimensional Backtracking

Add code
Jun 05, 2023
Viaarxiv icon

Noise Is Not the Main Factor Behind the Gap Between SGD and Adam on Transformers, but Sign Descent Might Be

Add code
Apr 27, 2023
Viaarxiv icon

Convergence Rates for the MAP of an Exponential Family and Stochastic Mirror Descent -- an Open Problem

Add code
Nov 12, 2021
Figure 1 for Convergence Rates for the MAP of an Exponential Family and Stochastic Mirror Descent -- an Open Problem
Figure 2 for Convergence Rates for the MAP of an Exponential Family and Stochastic Mirror Descent -- an Open Problem
Figure 3 for Convergence Rates for the MAP of an Exponential Family and Stochastic Mirror Descent -- an Open Problem
Figure 4 for Convergence Rates for the MAP of an Exponential Family and Stochastic Mirror Descent -- an Open Problem
Viaarxiv icon

Homeomorphic-Invariance of EM: Non-Asymptotic Convergence in KL Divergence for Exponential Families via Mirror Descent

Add code
Nov 02, 2020
Figure 1 for Homeomorphic-Invariance of EM: Non-Asymptotic Convergence in KL Divergence for Exponential Families via Mirror Descent
Figure 2 for Homeomorphic-Invariance of EM: Non-Asymptotic Convergence in KL Divergence for Exponential Families via Mirror Descent
Figure 3 for Homeomorphic-Invariance of EM: Non-Asymptotic Convergence in KL Divergence for Exponential Families via Mirror Descent
Figure 4 for Homeomorphic-Invariance of EM: Non-Asymptotic Convergence in KL Divergence for Exponential Families via Mirror Descent
Viaarxiv icon

Adaptive Gradient Methods Converge Faster with Over-Parameterization (and you can do a line-search)

Add code
Jun 11, 2020
Figure 1 for Adaptive Gradient Methods Converge Faster with Over-Parameterization (and you can do a line-search)
Figure 2 for Adaptive Gradient Methods Converge Faster with Over-Parameterization (and you can do a line-search)
Figure 3 for Adaptive Gradient Methods Converge Faster with Over-Parameterization (and you can do a line-search)
Figure 4 for Adaptive Gradient Methods Converge Faster with Over-Parameterization (and you can do a line-search)
Viaarxiv icon

BackPACK: Packing more into backprop

Add code
Feb 15, 2020
Figure 1 for BackPACK: Packing more into backprop
Figure 2 for BackPACK: Packing more into backprop
Figure 3 for BackPACK: Packing more into backprop
Figure 4 for BackPACK: Packing more into backprop
Viaarxiv icon

Limitations of the Empirical Fisher Approximation

Add code
May 29, 2019
Figure 1 for Limitations of the Empirical Fisher Approximation
Figure 2 for Limitations of the Empirical Fisher Approximation
Figure 3 for Limitations of the Empirical Fisher Approximation
Figure 4 for Limitations of the Empirical Fisher Approximation
Viaarxiv icon

SLANG: Fast Structured Covariance Approximations for Bayesian Deep Learning with Natural Gradient

Add code
Nov 11, 2018
Figure 1 for SLANG: Fast Structured Covariance Approximations for Bayesian Deep Learning with Natural Gradient
Figure 2 for SLANG: Fast Structured Covariance Approximations for Bayesian Deep Learning with Natural Gradient
Figure 3 for SLANG: Fast Structured Covariance Approximations for Bayesian Deep Learning with Natural Gradient
Figure 4 for SLANG: Fast Structured Covariance Approximations for Bayesian Deep Learning with Natural Gradient
Viaarxiv icon