Picture for Shane Bergsma

Shane Bergsma

Normalization Layer Per-Example Gradients are Sufficient to Predict Gradient Noise Scale in Transformers

Add code
Nov 01, 2024
Viaarxiv icon

Sparse maximal update parameterization: A holistic approach to sparse training dynamics

Add code
May 24, 2024
Viaarxiv icon

SutraNets: Sub-series Autoregressive Networks for Long-Sequence, Probabilistic Forecasting

Add code
Dec 22, 2023
Viaarxiv icon

C2FAR: Coarse-to-Fine Autoregressive Networks for Precise Probabilistic Forecasting

Add code
Dec 22, 2023
Viaarxiv icon