Picture for Jonathan D. Chang

Jonathan D. Chang

Regressing the Relative Future: Efficient Policy Optimization for Multi-turn RLHF

Add code
Oct 06, 2024
Figure 1 for Regressing the Relative Future: Efficient Policy Optimization for Multi-turn RLHF
Figure 2 for Regressing the Relative Future: Efficient Policy Optimization for Multi-turn RLHF
Figure 3 for Regressing the Relative Future: Efficient Policy Optimization for Multi-turn RLHF
Figure 4 for Regressing the Relative Future: Efficient Policy Optimization for Multi-turn RLHF
Viaarxiv icon

Critique-out-Loud Reward Models

Add code
Aug 21, 2024
Viaarxiv icon

REBEL: Reinforcement Learning via Regressing Relative Rewards

Add code
Apr 25, 2024
Viaarxiv icon

Dataset Reset Policy Optimization for RLHF

Add code
Apr 15, 2024
Viaarxiv icon

Adversarial Imitation Learning via Boosting

Add code
Apr 12, 2024
Figure 1 for Adversarial Imitation Learning via Boosting
Figure 2 for Adversarial Imitation Learning via Boosting
Figure 3 for Adversarial Imitation Learning via Boosting
Figure 4 for Adversarial Imitation Learning via Boosting
Viaarxiv icon

RL for Consistency Models: Faster Reward Guided Text-to-Image Generation

Add code
Mar 25, 2024
Figure 1 for RL for Consistency Models: Faster Reward Guided Text-to-Image Generation
Figure 2 for RL for Consistency Models: Faster Reward Guided Text-to-Image Generation
Figure 3 for RL for Consistency Models: Faster Reward Guided Text-to-Image Generation
Figure 4 for RL for Consistency Models: Faster Reward Guided Text-to-Image Generation
Viaarxiv icon

Policy-Gradient Training of Language Models for Ranking

Add code
Oct 06, 2023
Viaarxiv icon

Learning to Generate Better Than Your LLM

Add code
Jun 20, 2023
Viaarxiv icon

Learning Bellman Complete Representations for Offline Policy Evaluation

Add code
Jul 12, 2022
Figure 1 for Learning Bellman Complete Representations for Offline Policy Evaluation
Figure 2 for Learning Bellman Complete Representations for Offline Policy Evaluation
Figure 3 for Learning Bellman Complete Representations for Offline Policy Evaluation
Figure 4 for Learning Bellman Complete Representations for Offline Policy Evaluation
Viaarxiv icon

Mitigating Covariate Shift in Imitation Learning via Offline Data Without Great Coverage

Add code
Jun 14, 2021
Figure 1 for Mitigating Covariate Shift in Imitation Learning via Offline Data Without Great Coverage
Figure 2 for Mitigating Covariate Shift in Imitation Learning via Offline Data Without Great Coverage
Figure 3 for Mitigating Covariate Shift in Imitation Learning via Offline Data Without Great Coverage
Figure 4 for Mitigating Covariate Shift in Imitation Learning via Offline Data Without Great Coverage
Viaarxiv icon