Picture for Denny Wu

Denny Wu

When Do Transformers Outperform Feedforward and Recurrent Networks? A Statistical Perspective

Add code
Mar 14, 2025
Viaarxiv icon

Metastable Dynamics of Chain-of-Thought Reasoning: Provable Benefits of Search, RL and Distillation

Add code
Feb 02, 2025
Viaarxiv icon

Pretrained transformer efficiently learns low-dimensional target functions in-context

Add code
Nov 04, 2024
Viaarxiv icon

Learning Multi-Index Models with Neural Networks via Mean-Field Langevin Dynamics

Add code
Aug 14, 2024
Figure 1 for Learning Multi-Index Models with Neural Networks via Mean-Field Langevin Dynamics
Figure 2 for Learning Multi-Index Models with Neural Networks via Mean-Field Langevin Dynamics
Viaarxiv icon

Learning sum of diverse features: computational hardness and efficient gradient-based training for ridge combinations

Add code
Jun 17, 2024
Viaarxiv icon

Neural network learns low-dimensional polynomials with SGD near the information-theoretic limit

Add code
Jun 03, 2024
Viaarxiv icon

Nonlinear spiked covariance matrices and signal propagation in deep neural networks

Add code
Feb 15, 2024
Viaarxiv icon

Gradient-Based Feature Learning under Structured Data

Add code
Sep 07, 2023
Viaarxiv icon

Convergence of mean-field Langevin dynamics: Time and space discretization, stochastic gradient, and variance reduction

Add code
Jun 12, 2023
Viaarxiv icon

Primal and Dual Analysis of Entropic Fictitious Play for Finite-sum Problems

Add code
Mar 06, 2023
Viaarxiv icon