Picture for Joey Hong

Joey Hong

Interactive Dialogue Agents via Reinforcement Learning on Hindsight Regenerations

Add code
Nov 07, 2024
Figure 1 for Interactive Dialogue Agents via Reinforcement Learning on Hindsight Regenerations
Figure 2 for Interactive Dialogue Agents via Reinforcement Learning on Hindsight Regenerations
Figure 3 for Interactive Dialogue Agents via Reinforcement Learning on Hindsight Regenerations
Figure 4 for Interactive Dialogue Agents via Reinforcement Learning on Hindsight Regenerations
Viaarxiv icon

Q-SFT: Q-Learning for Language Models via Supervised Fine-Tuning

Add code
Nov 07, 2024
Figure 1 for Q-SFT: Q-Learning for Language Models via Supervised Fine-Tuning
Figure 2 for Q-SFT: Q-Learning for Language Models via Supervised Fine-Tuning
Figure 3 for Q-SFT: Q-Learning for Language Models via Supervised Fine-Tuning
Figure 4 for Q-SFT: Q-Learning for Language Models via Supervised Fine-Tuning
Viaarxiv icon

Strategically Conservative Q-Learning

Add code
Jun 06, 2024
Viaarxiv icon

LMRL Gym: Benchmarks for Multi-Turn Reinforcement Learning with Language Models

Add code
Nov 30, 2023
Figure 1 for LMRL Gym: Benchmarks for Multi-Turn Reinforcement Learning with Language Models
Figure 2 for LMRL Gym: Benchmarks for Multi-Turn Reinforcement Learning with Language Models
Figure 3 for LMRL Gym: Benchmarks for Multi-Turn Reinforcement Learning with Language Models
Figure 4 for LMRL Gym: Benchmarks for Multi-Turn Reinforcement Learning with Language Models
Viaarxiv icon

Zero-Shot Goal-Directed Dialogue via RL on Imagined Conversations

Add code
Nov 09, 2023
Viaarxiv icon

Offline RL with Observation Histories: Analyzing and Improving Sample Complexity

Add code
Oct 31, 2023
Viaarxiv icon

ExeDec: Execution Decomposition for Compositional Generalization in Neural Program Synthesis

Add code
Jul 26, 2023
Viaarxiv icon

Learning to Influence Human Behavior with Offline Reinforcement Learning

Add code
Mar 10, 2023
Figure 1 for Learning to Influence Human Behavior with Offline Reinforcement Learning
Figure 2 for Learning to Influence Human Behavior with Offline Reinforcement Learning
Figure 3 for Learning to Influence Human Behavior with Offline Reinforcement Learning
Figure 4 for Learning to Influence Human Behavior with Offline Reinforcement Learning
Viaarxiv icon

On the Sensitivity of Reward Inference to Misspecified Human Models

Add code
Dec 09, 2022
Figure 1 for On the Sensitivity of Reward Inference to Misspecified Human Models
Figure 2 for On the Sensitivity of Reward Inference to Misspecified Human Models
Figure 3 for On the Sensitivity of Reward Inference to Misspecified Human Models
Figure 4 for On the Sensitivity of Reward Inference to Misspecified Human Models
Viaarxiv icon

Multi-Task Off-Policy Learning from Bandit Feedback

Add code
Dec 09, 2022
Viaarxiv icon