Picture for Minhak Song

Minhak Song

Understanding Sharpness Dynamics in NN Training with a Minimalist Example: The Effects of Dataset Difficulty, Depth, Stochasticity, and More

Add code
Jun 07, 2025
Viaarxiv icon

Understanding the Performance Gap in Preference Learning: A Dichotomy of RLHF and DPO

Add code
May 26, 2025
Viaarxiv icon

Does SGD really happen in tiny subspaces?

Add code
May 25, 2024
Viaarxiv icon

Linear attention is (maybe) all you need (to understand transformer optimization)

Add code
Oct 02, 2023
Figure 1 for Linear attention is (maybe) all you need (to understand transformer optimization)
Figure 2 for Linear attention is (maybe) all you need (to understand transformer optimization)
Figure 3 for Linear attention is (maybe) all you need (to understand transformer optimization)
Figure 4 for Linear attention is (maybe) all you need (to understand transformer optimization)
Viaarxiv icon

Trajectory Alignment: Understanding the Edge of Stability Phenomenon via Bifurcation Theory

Add code
Jul 09, 2023
Viaarxiv icon