Picture for Jingfeng Wu

Jingfeng Wu

How Does Critical Batch Size Scale in Pre-training?

Add code
Oct 29, 2024
Viaarxiv icon

Context-Scaling versus Task-Scaling in In-Context Learning

Add code
Oct 16, 2024
Viaarxiv icon

Scaling Laws in Linear Regression: Compute, Parameters, and Data

Add code
Jun 12, 2024
Viaarxiv icon

Large Stepsize Gradient Descent for Non-Homogeneous Two-Layer Networks: Margin Improvement and Fast Optimization

Add code
Jun 12, 2024
Viaarxiv icon

Large Stepsize Gradient Descent for Logistic Loss: Non-Monotonicity of the Loss Improves Optimization Efficiency

Add code
Feb 24, 2024
Viaarxiv icon

In-Context Learning of a Linear Transformer Block: Benefits of the MLP Component and One-Step GD Initialization

Add code
Feb 22, 2024
Viaarxiv icon

Risk Bounds of Accelerated SGD for Overparameterized Linear Regression

Add code
Nov 23, 2023
Viaarxiv icon

How Many Pretraining Tasks Are Needed for In-Context Learning of Linear Regression?

Add code
Oct 12, 2023
Viaarxiv icon

Private Federated Frequency Estimation: Adapting to the Hardness of the Instance

Add code
Jun 15, 2023
Viaarxiv icon

Implicit Bias of Gradient Descent for Logistic Regression at the Edge of Stability

Add code
May 19, 2023
Viaarxiv icon