Picture for Kaiyue Wen

Kaiyue Wen

Understanding Warmup-Stable-Decay Learning Rates: A River Valley Loss Landscape Perspective

Add code
Oct 07, 2024
Viaarxiv icon

From Sparse Dependence to Sparse Attention: Unveiling How Chain-of-Thought Enhances Transformer Sample Efficiency

Add code
Oct 07, 2024
Viaarxiv icon

RNNs are not Transformers : The Key Bottleneck on In-context Retrieval

Add code
Feb 29, 2024
Viaarxiv icon

Transformers are uninterpretable with myopic methods: a case study with bounded Dyck grammars

Add code
Dec 03, 2023
Viaarxiv icon

Sharpness Minimization Algorithms Do Not Only Minimize Sharpness To Achieve Better Generalization

Add code
Jul 23, 2023
Viaarxiv icon

Practically Solving LPN in High Noise Regimes Faster Using Neural Networks

Add code
Mar 14, 2023
Viaarxiv icon

Finding Skill Neurons in Pre-trained Transformer-based Language Models

Add code
Nov 14, 2022
Viaarxiv icon

How Does Sharpness-Aware Minimization Minimize Sharpness?

Add code
Nov 10, 2022
Figure 1 for How Does Sharpness-Aware Minimization Minimize Sharpness?
Figure 2 for How Does Sharpness-Aware Minimization Minimize Sharpness?
Viaarxiv icon

Realistic Deep Learning May Not Fit Benignly

Add code
Jun 01, 2022
Figure 1 for Realistic Deep Learning May Not Fit Benignly
Figure 2 for Realistic Deep Learning May Not Fit Benignly
Figure 3 for Realistic Deep Learning May Not Fit Benignly
Figure 4 for Realistic Deep Learning May Not Fit Benignly
Viaarxiv icon