Picture for Kaifeng Lyu

Kaifeng Lyu

Feature Averaging: An Implicit Bias of Gradient Descent Leading to Non-Robustness in Neural Networks

Add code
Oct 14, 2024
Viaarxiv icon

AI-Assisted Generation of Difficult Math Questions

Add code
Jul 30, 2024
Viaarxiv icon

Safety Alignment Should Be Made More Than Just a Few Tokens Deep

Add code
Jun 10, 2024
Viaarxiv icon

RNNs are not Transformers : The Key Bottleneck on In-context Retrieval

Add code
Feb 29, 2024
Viaarxiv icon

Keeping LLMs Aligned After Fine-tuning: The Crucial Role of Prompt Templates

Add code
Feb 28, 2024
Viaarxiv icon

Efficient Stagewise Pretraining via Progressive Subnetworks

Add code
Feb 08, 2024
Viaarxiv icon

Dichotomy of Early and Late Phase Implicit Biases Can Provably Induce Grokking

Add code
Nov 30, 2023
Viaarxiv icon

A Quadratic Synchronization Rule for Distributed Deep Learning

Add code
Oct 22, 2023
Viaarxiv icon

DistillSpec: Improving Speculative Decoding via Knowledge Distillation

Add code
Oct 12, 2023
Viaarxiv icon

The Marginal Value of Momentum for Small Learning Rate SGD

Add code
Jul 27, 2023
Figure 1 for The Marginal Value of Momentum for Small Learning Rate SGD
Figure 2 for The Marginal Value of Momentum for Small Learning Rate SGD
Figure 3 for The Marginal Value of Momentum for Small Learning Rate SGD
Viaarxiv icon