Picture for Jingfeng Wu

Jingfeng Wu

Implicit Bias of Gradient Descent for Non-Homogeneous Deep Networks

Add code
Feb 22, 2025
Viaarxiv icon

Benefits of Early Stopping in Gradient Descent for Overparameterized Logistic Regression

Add code
Feb 18, 2025
Viaarxiv icon

How Does Critical Batch Size Scale in Pre-training?

Add code
Oct 29, 2024
Figure 1 for How Does Critical Batch Size Scale in Pre-training?
Figure 2 for How Does Critical Batch Size Scale in Pre-training?
Figure 3 for How Does Critical Batch Size Scale in Pre-training?
Figure 4 for How Does Critical Batch Size Scale in Pre-training?
Viaarxiv icon

Context-Scaling versus Task-Scaling in In-Context Learning

Add code
Oct 16, 2024
Figure 1 for Context-Scaling versus Task-Scaling in In-Context Learning
Figure 2 for Context-Scaling versus Task-Scaling in In-Context Learning
Figure 3 for Context-Scaling versus Task-Scaling in In-Context Learning
Figure 4 for Context-Scaling versus Task-Scaling in In-Context Learning
Viaarxiv icon

Scaling Laws in Linear Regression: Compute, Parameters, and Data

Add code
Jun 12, 2024
Viaarxiv icon

Large Stepsize Gradient Descent for Non-Homogeneous Two-Layer Networks: Margin Improvement and Fast Optimization

Add code
Jun 12, 2024
Viaarxiv icon

Large Stepsize Gradient Descent for Logistic Loss: Non-Monotonicity of the Loss Improves Optimization Efficiency

Add code
Feb 24, 2024
Viaarxiv icon

In-Context Learning of a Linear Transformer Block: Benefits of the MLP Component and One-Step GD Initialization

Add code
Feb 22, 2024
Viaarxiv icon

Risk Bounds of Accelerated SGD for Overparameterized Linear Regression

Add code
Nov 23, 2023
Viaarxiv icon

How Many Pretraining Tasks Are Needed for In-Context Learning of Linear Regression?

Add code
Oct 12, 2023
Viaarxiv icon