Picture for Dong Yan

Dong Yan

Boosting Deductive Reasoning with Step Signals In RLHF

Add code
Oct 12, 2024
Viaarxiv icon

Uncertainty-aware Reward Model: Teaching Reward Models to Know What is Unknown

Add code
Oct 01, 2024
Viaarxiv icon

3D-Properties: Identifying Challenges in DPO and Charting a Path Forward

Add code
Jun 11, 2024
Figure 1 for 3D-Properties: Identifying Challenges in DPO and Charting a Path Forward
Figure 2 for 3D-Properties: Identifying Challenges in DPO and Charting a Path Forward
Figure 3 for 3D-Properties: Identifying Challenges in DPO and Charting a Path Forward
Figure 4 for 3D-Properties: Identifying Challenges in DPO and Charting a Path Forward
Viaarxiv icon

Exploring the LLM Journey from Cognition to Expression with Linear Representations

Add code
May 27, 2024
Figure 1 for Exploring the LLM Journey from Cognition to Expression with Linear Representations
Figure 2 for Exploring the LLM Journey from Cognition to Expression with Linear Representations
Figure 3 for Exploring the LLM Journey from Cognition to Expression with Linear Representations
Figure 4 for Exploring the LLM Journey from Cognition to Expression with Linear Representations
Viaarxiv icon

SPO: Multi-Dimensional Preference Sequential Alignment With Implicit Reward Modeling

Add code
May 21, 2024
Viaarxiv icon

Rethinking Information Structures in RLHF: Reward Generalization from a Graph Theory Perspective

Add code
Feb 20, 2024
Viaarxiv icon

Baichuan 2: Open Large-scale Language Models

Add code
Sep 20, 2023
Viaarxiv icon

Reward Informed Dreamer for Task Generalization in Reinforcement Learning

Add code
Mar 09, 2023
Viaarxiv icon

Model-based Reinforcement Learning with a Hamiltonian Canonical ODE Network

Add code
Nov 02, 2022
Viaarxiv icon

On the Reuse Bias in Off-Policy Reinforcement Learning

Add code
Sep 15, 2022
Figure 1 for On the Reuse Bias in Off-Policy Reinforcement Learning
Figure 2 for On the Reuse Bias in Off-Policy Reinforcement Learning
Figure 3 for On the Reuse Bias in Off-Policy Reinforcement Learning
Figure 4 for On the Reuse Bias in Off-Policy Reinforcement Learning
Viaarxiv icon