Picture for Quanquan Gu

Quanquan Gu

Transformers Trained via Gradient Descent Can Provably Learn a Class of Teacher Models

Add code
Mar 24, 2026
Viaarxiv icon

Near-Optimal Regret for KL-Regularized Multi-Armed Bandits

Add code
Mar 02, 2026
Viaarxiv icon

Dimension-Independent Convergence of Underdamped Langevin Monte Carlo in KL Divergence

Add code
Mar 02, 2026
Viaarxiv icon

Protein Autoregressive Modeling via Multiscale Structure Generation

Add code
Feb 04, 2026
Viaarxiv icon

Scalable Spatio-Temporal SE(3) Diffusion for Long-Horizon Protein Dynamics

Add code
Feb 02, 2026
Viaarxiv icon

Deep Delta Learning

Add code
Jan 01, 2026
Viaarxiv icon

Group Representational Position Encoding

Add code
Dec 08, 2025
Figure 1 for Group Representational Position Encoding
Figure 2 for Group Representational Position Encoding
Figure 3 for Group Representational Position Encoding
Figure 4 for Group Representational Position Encoding
Viaarxiv icon

Higher-order Linear Attention

Add code
Oct 31, 2025
Viaarxiv icon

Causal Attention with Lookahead Keys

Add code
Sep 09, 2025
Viaarxiv icon

SwingArena: Competitive Programming Arena for Long-context GitHub Issue Solving

Add code
May 29, 2025
Figure 1 for SwingArena: Competitive Programming Arena for Long-context GitHub Issue Solving
Figure 2 for SwingArena: Competitive Programming Arena for Long-context GitHub Issue Solving
Figure 3 for SwingArena: Competitive Programming Arena for Long-context GitHub Issue Solving
Figure 4 for SwingArena: Competitive Programming Arena for Long-context GitHub Issue Solving
Viaarxiv icon