Picture for Bingrui Li

Bingrui Li

Sharpness-Aware Minimization Efficiently Selects Flatter Minima Late in Training

Add code
Oct 14, 2024
Viaarxiv icon

On the Optimization and Generalization of Two-layer Transformers with Sign Gradient Descent

Add code
Oct 07, 2024
Figure 1 for On the Optimization and Generalization of Two-layer Transformers with Sign Gradient Descent
Figure 2 for On the Optimization and Generalization of Two-layer Transformers with Sign Gradient Descent
Figure 3 for On the Optimization and Generalization of Two-layer Transformers with Sign Gradient Descent
Figure 4 for On the Optimization and Generalization of Two-layer Transformers with Sign Gradient Descent
Viaarxiv icon

Memory Efficient Optimizers with 4-bit States

Add code
Sep 06, 2023
Viaarxiv icon