Picture for Quanquan Gu

Quanquan Gu

Multi-Step Alignment as Markov Games: An Optimistic Online Gradient Descent Approach with Convergence Guarantees

Add code
Feb 18, 2025
Viaarxiv icon

Logarithmic Regret for Online KL-Regularized Reinforcement Learning

Add code
Feb 11, 2025
Viaarxiv icon

Nearly Optimal Sample Complexity of Offline KL-Regularized Contextual Bandits under Single-Policy Concentrability

Add code
Feb 09, 2025
Viaarxiv icon

Tensor Product Attention Is All You Need

Add code
Jan 11, 2025
Viaarxiv icon

Towards Simple and Provable Parameter-Free Adaptive Gradient Methods

Add code
Dec 27, 2024
Viaarxiv icon

MARS: Unleashing the Power of Variance Reduction for Training Large Models

Add code
Nov 15, 2024
Figure 1 for MARS: Unleashing the Power of Variance Reduction for Training Large Models
Figure 2 for MARS: Unleashing the Power of Variance Reduction for Training Large Models
Figure 3 for MARS: Unleashing the Power of Variance Reduction for Training Large Models
Figure 4 for MARS: Unleashing the Power of Variance Reduction for Training Large Models
Viaarxiv icon

ProteinWeaver: A Divide-and-Assembly Approach for Protein Backbone Design

Add code
Nov 08, 2024
Viaarxiv icon

Sharp Analysis for KL-Regularized Contextual Bandits and RLHF

Add code
Nov 07, 2024
Viaarxiv icon

CoPS: Empowering LLM Agents with Provable Cross-Task Experience Sharing

Add code
Oct 22, 2024
Figure 1 for CoPS: Empowering LLM Agents with Provable Cross-Task Experience Sharing
Figure 2 for CoPS: Empowering LLM Agents with Provable Cross-Task Experience Sharing
Figure 3 for CoPS: Empowering LLM Agents with Provable Cross-Task Experience Sharing
Figure 4 for CoPS: Empowering LLM Agents with Provable Cross-Task Experience Sharing
Viaarxiv icon

Unified Convergence Analysis for Score-Based Diffusion Models with Deterministic Samplers

Add code
Oct 18, 2024
Viaarxiv icon