Picture for Chulhee Yun

Chulhee Yun

Provable Benefit of Cutout and CutMix for Feature Learning

Add code
Oct 31, 2024
Viaarxiv icon

DASH: Warm-Starting Neural Network Training in Stationary Settings without Loss of Plasticity

Add code
Oct 30, 2024
Viaarxiv icon

Arithmetic Transformers Can Length-Generalize in Both Operand Length and Count

Add code
Oct 21, 2024
Viaarxiv icon

Position Coupling: Leveraging Task Structure for Improved Length Generalization of Transformers

Add code
May 31, 2024
Viaarxiv icon

Does SGD really happen in tiny subspaces?

Add code
May 25, 2024
Viaarxiv icon

Fundamental Benefit of Alternating Updates in Minimax Optimization

Add code
Feb 16, 2024
Viaarxiv icon

Large Catapults in Momentum Gradient Descent with Warmup: An Empirical Study

Add code
Nov 25, 2023
Viaarxiv icon

Fair Streaming Principal Component Analysis: Statistical and Algorithmic Viewpoint

Add code
Oct 28, 2023
Viaarxiv icon

Linear attention is (maybe) all you need (to understand transformer optimization)

Add code
Oct 02, 2023
Figure 1 for Linear attention is (maybe) all you need (to understand transformer optimization)
Figure 2 for Linear attention is (maybe) all you need (to understand transformer optimization)
Figure 3 for Linear attention is (maybe) all you need (to understand transformer optimization)
Figure 4 for Linear attention is (maybe) all you need (to understand transformer optimization)
Viaarxiv icon

Trajectory Alignment: Understanding the Edge of Stability Phenomenon via Bifurcation Theory

Add code
Jul 09, 2023
Viaarxiv icon