Picture for Jingfeng Wu

Jingfeng Wu

A Simplified Analysis of SGD for Linear Regression with Weight Averaging

Add code
Jun 18, 2025
Viaarxiv icon

Improved Scaling Laws in Linear Regression via Data Reuse

Add code
Jun 10, 2025
Viaarxiv icon

Memory-Statistics Tradeoff in Continual Learning with Structural Regularization

Add code
Apr 05, 2025
Viaarxiv icon

Minimax Optimal Convergence of Gradient Descent in Logistic Regression via Large and Adaptive Stepsizes

Add code
Apr 05, 2025
Viaarxiv icon

Implicit Bias of Gradient Descent for Non-Homogeneous Deep Networks

Add code
Feb 22, 2025
Viaarxiv icon

Benefits of Early Stopping in Gradient Descent for Overparameterized Logistic Regression

Add code
Feb 18, 2025
Viaarxiv icon

How Does Critical Batch Size Scale in Pre-training?

Add code
Oct 29, 2024
Figure 1 for How Does Critical Batch Size Scale in Pre-training?
Figure 2 for How Does Critical Batch Size Scale in Pre-training?
Figure 3 for How Does Critical Batch Size Scale in Pre-training?
Figure 4 for How Does Critical Batch Size Scale in Pre-training?
Viaarxiv icon

Context-Scaling versus Task-Scaling in In-Context Learning

Add code
Oct 16, 2024
Figure 1 for Context-Scaling versus Task-Scaling in In-Context Learning
Figure 2 for Context-Scaling versus Task-Scaling in In-Context Learning
Figure 3 for Context-Scaling versus Task-Scaling in In-Context Learning
Figure 4 for Context-Scaling versus Task-Scaling in In-Context Learning
Viaarxiv icon

Large Stepsize Gradient Descent for Non-Homogeneous Two-Layer Networks: Margin Improvement and Fast Optimization

Add code
Jun 12, 2024
Viaarxiv icon

Scaling Laws in Linear Regression: Compute, Parameters, and Data

Add code
Jun 12, 2024
Viaarxiv icon