Picture for Andrea Zanette

Andrea Zanette

Fast Best-of-N Decoding via Speculative Rejection

Add code
Oct 26, 2024
Viaarxiv icon

ArCHer: Training Language Model Agents via Hierarchical Multi-Turn RL

Add code
Feb 29, 2024
Viaarxiv icon

Is Offline Decision Making Possible with Only Few Samples? Reliable Decisions in Data-Starved Bandits via Trust Region Enhancement

Add code
Feb 24, 2024
Viaarxiv icon

Policy Finetuning in Reinforcement Learning via Design of Experiments using Offline Data

Add code
Jul 10, 2023
Viaarxiv icon

When is Realizability Sufficient for Off-Policy Reinforcement Learning?

Add code
Nov 10, 2022
Viaarxiv icon

Stabilizing Q-learning with Linear Architectures for Provably Efficient Learning

Add code
Jun 01, 2022
Viaarxiv icon

Bellman Residual Orthogonalization for Offline Reinforcement Learning

Add code
Mar 24, 2022
Viaarxiv icon

Provable Benefits of Actor-Critic Methods for Offline Reinforcement Learning

Add code
Aug 19, 2021
Viaarxiv icon

Design of Experiments for Stochastic Contextual Linear Bandits

Add code
Jul 22, 2021
Figure 1 for Design of Experiments for Stochastic Contextual Linear Bandits
Figure 2 for Design of Experiments for Stochastic Contextual Linear Bandits
Figure 3 for Design of Experiments for Stochastic Contextual Linear Bandits
Figure 4 for Design of Experiments for Stochastic Contextual Linear Bandits
Viaarxiv icon

Cautiously Optimistic Policy Optimization and Exploration with Linear Function Approximation

Add code
Mar 24, 2021
Figure 1 for Cautiously Optimistic Policy Optimization and Exploration with Linear Function Approximation
Viaarxiv icon