Picture for Dylan J. Foster

Dylan J. Foster

Computational-Statistical Tradeoffs at the Next-Token Prediction Barrier: Autoregressive and Imitation Learning under Misspecification

Add code
Feb 18, 2025
Viaarxiv icon

Necessary and Sufficient Oracles: Toward a Computational Taxonomy For Reinforcement Learning

Add code
Feb 12, 2025
Viaarxiv icon

Self-Improvement in Language Models: The Sharpening Mechanism

Add code
Dec 02, 2024
Viaarxiv icon

Reinforcement Learning under Latent Dynamics: Toward Statistical and Algorithmic Modularity

Add code
Oct 23, 2024
Figure 1 for Reinforcement Learning under Latent Dynamics: Toward Statistical and Algorithmic Modularity
Viaarxiv icon

Assouad, Fano, and Le Cam with Interaction: A Unifying Lower Bound Framework and Characterization for Bandit Learnability

Add code
Oct 07, 2024
Viaarxiv icon

Is Behavior Cloning All You Need? Understanding Horizon in Imitation Learning

Add code
Jul 20, 2024
Figure 1 for Is Behavior Cloning All You Need? Understanding Horizon in Imitation Learning
Figure 2 for Is Behavior Cloning All You Need? Understanding Horizon in Imitation Learning
Figure 3 for Is Behavior Cloning All You Need? Understanding Horizon in Imitation Learning
Figure 4 for Is Behavior Cloning All You Need? Understanding Horizon in Imitation Learning
Viaarxiv icon

Correcting the Mythos of KL-Regularization: Direct Alignment without Overparameterization via Chi-squared Preference Optimization

Add code
Jul 18, 2024
Figure 1 for Correcting the Mythos of KL-Regularization: Direct Alignment without Overparameterization via Chi-squared Preference Optimization
Figure 2 for Correcting the Mythos of KL-Regularization: Direct Alignment without Overparameterization via Chi-squared Preference Optimization
Figure 3 for Correcting the Mythos of KL-Regularization: Direct Alignment without Overparameterization via Chi-squared Preference Optimization
Viaarxiv icon

Exploratory Preference Optimization: Harnessing Implicit Q*-Approximation for Sample-Efficient RLHF

Add code
May 31, 2024
Figure 1 for Exploratory Preference Optimization: Harnessing Implicit Q*-Approximation for Sample-Efficient RLHF
Figure 2 for Exploratory Preference Optimization: Harnessing Implicit Q*-Approximation for Sample-Efficient RLHF
Viaarxiv icon

Rich-Observation Reinforcement Learning with Continuous Latent Dynamics

Add code
May 29, 2024
Viaarxiv icon

The Power of Resets in Online Reinforcement Learning

Add code
Apr 26, 2024
Viaarxiv icon