Picture for Audrey Huang

Audrey Huang

Self-Improvement in Language Models: The Sharpening Mechanism

Add code
Dec 02, 2024
Viaarxiv icon

Correcting the Mythos of KL-Regularization: Direct Alignment without Overparameterization via Chi-squared Preference Optimization

Add code
Jul 18, 2024
Figure 1 for Correcting the Mythos of KL-Regularization: Direct Alignment without Overparameterization via Chi-squared Preference Optimization
Figure 2 for Correcting the Mythos of KL-Regularization: Direct Alignment without Overparameterization via Chi-squared Preference Optimization
Figure 3 for Correcting the Mythos of KL-Regularization: Direct Alignment without Overparameterization via Chi-squared Preference Optimization
Viaarxiv icon

Reinforcement Learning in Low-Rank MDPs with Density Features

Add code
Feb 04, 2023
Viaarxiv icon

Beyond the Return: Off-policy Function Estimation under User-specified Error-measuring Distributions

Add code
Oct 27, 2022
Viaarxiv icon

Off-Policy Risk Assessment in Markov Decision Processes

Add code
Sep 21, 2022
Figure 1 for Off-Policy Risk Assessment in Markov Decision Processes
Figure 2 for Off-Policy Risk Assessment in Markov Decision Processes
Viaarxiv icon

Supervised Learning with General Risk Functionals

Add code
Jun 27, 2022
Figure 1 for Supervised Learning with General Risk Functionals
Figure 2 for Supervised Learning with General Risk Functionals
Figure 3 for Supervised Learning with General Risk Functionals
Figure 4 for Supervised Learning with General Risk Functionals
Viaarxiv icon

Offline Reinforcement Learning with Realizability and Single-policy Concentrability

Add code
Feb 11, 2022
Figure 1 for Offline Reinforcement Learning with Realizability and Single-policy Concentrability
Figure 2 for Offline Reinforcement Learning with Realizability and Single-policy Concentrability
Viaarxiv icon

Off-Policy Risk Assessment in Contextual Bandits

Add code
Apr 18, 2021
Figure 1 for Off-Policy Risk Assessment in Contextual Bandits
Figure 2 for Off-Policy Risk Assessment in Contextual Bandits
Figure 3 for Off-Policy Risk Assessment in Contextual Bandits
Viaarxiv icon

On the Convergence and Optimality of Policy Gradient for Markov Coherent Risk

Add code
Mar 05, 2021
Figure 1 for On the Convergence and Optimality of Policy Gradient for Markov Coherent Risk
Figure 2 for On the Convergence and Optimality of Policy Gradient for Markov Coherent Risk
Figure 3 for On the Convergence and Optimality of Policy Gradient for Markov Coherent Risk
Figure 4 for On the Convergence and Optimality of Policy Gradient for Markov Coherent Risk
Viaarxiv icon

Graph-Structured Visual Imitation

Add code
Jul 11, 2019
Figure 1 for Graph-Structured Visual Imitation
Figure 2 for Graph-Structured Visual Imitation
Figure 3 for Graph-Structured Visual Imitation
Figure 4 for Graph-Structured Visual Imitation
Viaarxiv icon