Picture for Zhihan Liu

Zhihan Liu

Reward-Augmented Data Enhances Direct Preference Alignment of LLMs

Add code
Oct 10, 2024
Figure 1 for Reward-Augmented Data Enhances Direct Preference Alignment of LLMs
Figure 2 for Reward-Augmented Data Enhances Direct Preference Alignment of LLMs
Figure 3 for Reward-Augmented Data Enhances Direct Preference Alignment of LLMs
Figure 4 for Reward-Augmented Data Enhances Direct Preference Alignment of LLMs
Viaarxiv icon

Just say what you want: only-prompting self-rewarding online preference optimization

Add code
Sep 26, 2024
Viaarxiv icon

Towards Fully Autonomous Research Powered by LLMs: Case Study on Simulations

Add code
Aug 28, 2024
Viaarxiv icon

Toward Optimal LLM Alignments Using Two-Player Games

Add code
Jun 16, 2024
Viaarxiv icon

Provably Mitigating Overoptimization in RLHF: Your SFT Loss is Implicitly an Adversarial Regularizer

Add code
May 26, 2024
Figure 1 for Provably Mitigating Overoptimization in RLHF: Your SFT Loss is Implicitly an Adversarial Regularizer
Figure 2 for Provably Mitigating Overoptimization in RLHF: Your SFT Loss is Implicitly an Adversarial Regularizer
Figure 3 for Provably Mitigating Overoptimization in RLHF: Your SFT Loss is Implicitly an Adversarial Regularizer
Figure 4 for Provably Mitigating Overoptimization in RLHF: Your SFT Loss is Implicitly an Adversarial Regularizer
Viaarxiv icon

Can Large Language Models Play Games? A Case Study of A Self-Play Approach

Add code
Mar 08, 2024
Figure 1 for Can Large Language Models Play Games? A Case Study of A Self-Play Approach
Figure 2 for Can Large Language Models Play Games? A Case Study of A Self-Play Approach
Figure 3 for Can Large Language Models Play Games? A Case Study of A Self-Play Approach
Figure 4 for Can Large Language Models Play Games? A Case Study of A Self-Play Approach
Viaarxiv icon

How Can LLM Guide RL? A Value-Based Approach

Add code
Feb 25, 2024
Viaarxiv icon

A Principled Framework for Knowledge-enhanced Large Language Model

Add code
Nov 18, 2023
Viaarxiv icon

Reason for Future, Act for Now: A Principled Framework for Autonomous LLM Agents with Provable Sample Efficiency

Add code
Oct 11, 2023
Viaarxiv icon

Sample-Efficient Multi-Agent RL: An Optimization Perspective

Add code
Oct 10, 2023
Viaarxiv icon