Picture for Zhaoran Wang

Zhaoran Wang

Are Transformers Able to Reason by Connecting Separated Knowledge in Training Data?

Add code
Jan 27, 2025
Viaarxiv icon

Hindsight Planner: A Closed-Loop Few-Shot Planner for Embodied Instruction Following

Add code
Dec 27, 2024
Viaarxiv icon

An Instrumental Value for Data Production and its Application to Data Pricing

Add code
Dec 24, 2024
Viaarxiv icon

DSTC: Direct Preference Learning with Only Self-Generated Tests and Code to Improve Code LMs

Add code
Nov 20, 2024
Figure 1 for DSTC: Direct Preference Learning with Only Self-Generated Tests and Code to Improve Code LMs
Figure 2 for DSTC: Direct Preference Learning with Only Self-Generated Tests and Code to Improve Code LMs
Figure 3 for DSTC: Direct Preference Learning with Only Self-Generated Tests and Code to Improve Code LMs
Figure 4 for DSTC: Direct Preference Learning with Only Self-Generated Tests and Code to Improve Code LMs
Viaarxiv icon

Language-Model-Assisted Bi-Level Programming for Reward Learning from Internet Videos

Add code
Oct 11, 2024
Figure 1 for Language-Model-Assisted Bi-Level Programming for Reward Learning from Internet Videos
Figure 2 for Language-Model-Assisted Bi-Level Programming for Reward Learning from Internet Videos
Figure 3 for Language-Model-Assisted Bi-Level Programming for Reward Learning from Internet Videos
Figure 4 for Language-Model-Assisted Bi-Level Programming for Reward Learning from Internet Videos
Viaarxiv icon

Reward-Augmented Data Enhances Direct Preference Alignment of LLMs

Add code
Oct 10, 2024
Figure 1 for Reward-Augmented Data Enhances Direct Preference Alignment of LLMs
Figure 2 for Reward-Augmented Data Enhances Direct Preference Alignment of LLMs
Figure 3 for Reward-Augmented Data Enhances Direct Preference Alignment of LLMs
Figure 4 for Reward-Augmented Data Enhances Direct Preference Alignment of LLMs
Viaarxiv icon

Just say what you want: only-prompting self-rewarding online preference optimization

Add code
Sep 26, 2024
Viaarxiv icon

Safe MPC Alignment with Human Directional Feedback

Add code
Jul 05, 2024
Viaarxiv icon

Toward Optimal LLM Alignments Using Two-Player Games

Add code
Jun 16, 2024
Viaarxiv icon

Self-Exploring Language Models: Active Preference Elicitation for Online Alignment

Add code
May 29, 2024
Figure 1 for Self-Exploring Language Models: Active Preference Elicitation for Online Alignment
Figure 2 for Self-Exploring Language Models: Active Preference Elicitation for Online Alignment
Figure 3 for Self-Exploring Language Models: Active Preference Elicitation for Online Alignment
Figure 4 for Self-Exploring Language Models: Active Preference Elicitation for Online Alignment
Viaarxiv icon