Picture for Zhaoran Wang

Zhaoran Wang

BRiTE: Bootstrapping Reinforced Thinking Process to Enhance Language Model Reasoning

Add code
Jan 31, 2025
Figure 1 for BRiTE: Bootstrapping Reinforced Thinking Process to Enhance Language Model Reasoning
Figure 2 for BRiTE: Bootstrapping Reinforced Thinking Process to Enhance Language Model Reasoning
Figure 3 for BRiTE: Bootstrapping Reinforced Thinking Process to Enhance Language Model Reasoning
Figure 4 for BRiTE: Bootstrapping Reinforced Thinking Process to Enhance Language Model Reasoning
Viaarxiv icon

Are Transformers Able to Reason by Connecting Separated Knowledge in Training Data?

Add code
Jan 27, 2025
Viaarxiv icon

Hindsight Planner: A Closed-Loop Few-Shot Planner for Embodied Instruction Following

Add code
Dec 27, 2024
Figure 1 for Hindsight Planner: A Closed-Loop Few-Shot Planner for Embodied Instruction Following
Figure 2 for Hindsight Planner: A Closed-Loop Few-Shot Planner for Embodied Instruction Following
Figure 3 for Hindsight Planner: A Closed-Loop Few-Shot Planner for Embodied Instruction Following
Figure 4 for Hindsight Planner: A Closed-Loop Few-Shot Planner for Embodied Instruction Following
Viaarxiv icon

An Instrumental Value for Data Production and its Application to Data Pricing

Add code
Dec 24, 2024
Viaarxiv icon

DSTC: Direct Preference Learning with Only Self-Generated Tests and Code to Improve Code LMs

Add code
Nov 20, 2024
Figure 1 for DSTC: Direct Preference Learning with Only Self-Generated Tests and Code to Improve Code LMs
Figure 2 for DSTC: Direct Preference Learning with Only Self-Generated Tests and Code to Improve Code LMs
Figure 3 for DSTC: Direct Preference Learning with Only Self-Generated Tests and Code to Improve Code LMs
Figure 4 for DSTC: Direct Preference Learning with Only Self-Generated Tests and Code to Improve Code LMs
Viaarxiv icon

Language-Model-Assisted Bi-Level Programming for Reward Learning from Internet Videos

Add code
Oct 11, 2024
Figure 1 for Language-Model-Assisted Bi-Level Programming for Reward Learning from Internet Videos
Figure 2 for Language-Model-Assisted Bi-Level Programming for Reward Learning from Internet Videos
Figure 3 for Language-Model-Assisted Bi-Level Programming for Reward Learning from Internet Videos
Figure 4 for Language-Model-Assisted Bi-Level Programming for Reward Learning from Internet Videos
Viaarxiv icon

Reward-Augmented Data Enhances Direct Preference Alignment of LLMs

Add code
Oct 10, 2024
Figure 1 for Reward-Augmented Data Enhances Direct Preference Alignment of LLMs
Figure 2 for Reward-Augmented Data Enhances Direct Preference Alignment of LLMs
Figure 3 for Reward-Augmented Data Enhances Direct Preference Alignment of LLMs
Figure 4 for Reward-Augmented Data Enhances Direct Preference Alignment of LLMs
Viaarxiv icon

Just say what you want: only-prompting self-rewarding online preference optimization

Add code
Sep 26, 2024
Figure 1 for Just say what you want: only-prompting self-rewarding online preference optimization
Figure 2 for Just say what you want: only-prompting self-rewarding online preference optimization
Figure 3 for Just say what you want: only-prompting self-rewarding online preference optimization
Figure 4 for Just say what you want: only-prompting self-rewarding online preference optimization
Viaarxiv icon

Safe MPC Alignment with Human Directional Feedback

Add code
Jul 05, 2024
Viaarxiv icon

Toward Optimal LLM Alignments Using Two-Player Games

Add code
Jun 16, 2024
Viaarxiv icon