Picture for Hongyi Guo

Hongyi Guo

Toward Optimal LLM Alignments Using Two-Player Games

Add code
Jun 16, 2024
Viaarxiv icon

Provably Mitigating Overoptimization in RLHF: Your SFT Loss is Implicitly an Adversarial Regularizer

Add code
May 26, 2024
Figure 1 for Provably Mitigating Overoptimization in RLHF: Your SFT Loss is Implicitly an Adversarial Regularizer
Figure 2 for Provably Mitigating Overoptimization in RLHF: Your SFT Loss is Implicitly an Adversarial Regularizer
Figure 3 for Provably Mitigating Overoptimization in RLHF: Your SFT Loss is Implicitly an Adversarial Regularizer
Figure 4 for Provably Mitigating Overoptimization in RLHF: Your SFT Loss is Implicitly an Adversarial Regularizer
Viaarxiv icon

Diverse Randomized Value Functions: A Provably Pessimistic Approach for Offline Reinforcement Learning

Add code
Apr 09, 2024
Viaarxiv icon

Improving Reinforcement Learning from Human Feedback Using Contrastive Rewards

Add code
Mar 14, 2024
Viaarxiv icon

Can Large Language Models Play Games? A Case Study of A Self-Play Approach

Add code
Mar 08, 2024
Figure 1 for Can Large Language Models Play Games? A Case Study of A Self-Play Approach
Figure 2 for Can Large Language Models Play Games? A Case Study of A Self-Play Approach
Figure 3 for Can Large Language Models Play Games? A Case Study of A Self-Play Approach
Figure 4 for Can Large Language Models Play Games? A Case Study of A Self-Play Approach
Viaarxiv icon

Measuring and Reducing LLM Hallucination without Gold-Standard Answers via Expertise-Weighting

Add code
Feb 16, 2024
Viaarxiv icon

Human-Instruction-Free LLM Self-Alignment with Limited Samples

Add code
Jan 06, 2024
Viaarxiv icon

Reason for Future, Act for Now: A Principled Framework for Autonomous LLM Agents with Provable Sample Efficiency

Add code
Oct 11, 2023
Viaarxiv icon

Behavior Contrastive Learning for Unsupervised Skill Discovery

Add code
May 08, 2023
Viaarxiv icon

Policy Learning Using Weak Supervision

Add code
Oct 05, 2020
Figure 1 for Policy Learning Using Weak Supervision
Figure 2 for Policy Learning Using Weak Supervision
Figure 3 for Policy Learning Using Weak Supervision
Figure 4 for Policy Learning Using Weak Supervision
Viaarxiv icon