Picture for Yuanzhao Zhai

Yuanzhao Zhai

Enhancing Decision-Making for LLM Agents via Step-Level Q-Value Models

Add code
Sep 14, 2024
Figure 1 for Enhancing Decision-Making for LLM Agents via Step-Level Q-Value Models
Figure 2 for Enhancing Decision-Making for LLM Agents via Step-Level Q-Value Models
Figure 3 for Enhancing Decision-Making for LLM Agents via Step-Level Q-Value Models
Figure 4 for Enhancing Decision-Making for LLM Agents via Step-Level Q-Value Models
Viaarxiv icon

Online Self-Preferring Language Models

Add code
May 23, 2024
Figure 1 for Online Self-Preferring Language Models
Figure 2 for Online Self-Preferring Language Models
Figure 3 for Online Self-Preferring Language Models
Figure 4 for Online Self-Preferring Language Models
Viaarxiv icon

COPR: Continual Human Preference Learning via Optimal Policy Regularization

Add code
Feb 27, 2024
Viaarxiv icon

Optimistic Model Rollouts for Pessimistic Offline Policy Optimization

Add code
Jan 11, 2024
Viaarxiv icon

Uncertainty-Penalized Reinforcement Learning from Human Feedback with Diverse Reward LoRA Ensembles

Add code
Dec 30, 2023
Viaarxiv icon

COPF: Continual Learning Human Preference through Optimal Policy Fitting

Add code
Oct 28, 2023
Figure 1 for COPF: Continual Learning Human Preference through Optimal Policy Fitting
Figure 2 for COPF: Continual Learning Human Preference through Optimal Policy Fitting
Figure 3 for COPF: Continual Learning Human Preference through Optimal Policy Fitting
Figure 4 for COPF: Continual Learning Human Preference through Optimal Policy Fitting
Viaarxiv icon

Self-Supervised Exploration via Temporal Inconsistency in Reinforcement Learning

Add code
Aug 24, 2022
Figure 1 for Self-Supervised Exploration via Temporal Inconsistency in Reinforcement Learning
Figure 2 for Self-Supervised Exploration via Temporal Inconsistency in Reinforcement Learning
Figure 3 for Self-Supervised Exploration via Temporal Inconsistency in Reinforcement Learning
Figure 4 for Self-Supervised Exploration via Temporal Inconsistency in Reinforcement Learning
Viaarxiv icon

Dynamic Memory-based Curiosity: A Bootstrap Approach for Exploration

Add code
Aug 24, 2022
Figure 1 for Dynamic Memory-based Curiosity: A Bootstrap Approach for Exploration
Figure 2 for Dynamic Memory-based Curiosity: A Bootstrap Approach for Exploration
Figure 3 for Dynamic Memory-based Curiosity: A Bootstrap Approach for Exploration
Figure 4 for Dynamic Memory-based Curiosity: A Bootstrap Approach for Exploration
Viaarxiv icon