Picture for Hongyi Guo

Hongyi Guo

BRiTE: Bootstrapping Reinforced Thinking Process to Enhance Language Model Reasoning

Add code
Jan 31, 2025
Figure 1 for BRiTE: Bootstrapping Reinforced Thinking Process to Enhance Language Model Reasoning
Figure 2 for BRiTE: Bootstrapping Reinforced Thinking Process to Enhance Language Model Reasoning
Figure 3 for BRiTE: Bootstrapping Reinforced Thinking Process to Enhance Language Model Reasoning
Figure 4 for BRiTE: Bootstrapping Reinforced Thinking Process to Enhance Language Model Reasoning
Viaarxiv icon

Toward Optimal LLM Alignments Using Two-Player Games

Add code
Jun 16, 2024
Viaarxiv icon

Provably Mitigating Overoptimization in RLHF: Your SFT Loss is Implicitly an Adversarial Regularizer

Add code
May 26, 2024
Figure 1 for Provably Mitigating Overoptimization in RLHF: Your SFT Loss is Implicitly an Adversarial Regularizer
Figure 2 for Provably Mitigating Overoptimization in RLHF: Your SFT Loss is Implicitly an Adversarial Regularizer
Figure 3 for Provably Mitigating Overoptimization in RLHF: Your SFT Loss is Implicitly an Adversarial Regularizer
Figure 4 for Provably Mitigating Overoptimization in RLHF: Your SFT Loss is Implicitly an Adversarial Regularizer
Viaarxiv icon

Diverse Randomized Value Functions: A Provably Pessimistic Approach for Offline Reinforcement Learning

Add code
Apr 09, 2024
Viaarxiv icon

Improving Reinforcement Learning from Human Feedback Using Contrastive Rewards

Add code
Mar 14, 2024
Viaarxiv icon

Can Large Language Models Play Games? A Case Study of A Self-Play Approach

Add code
Mar 08, 2024
Figure 1 for Can Large Language Models Play Games? A Case Study of A Self-Play Approach
Figure 2 for Can Large Language Models Play Games? A Case Study of A Self-Play Approach
Figure 3 for Can Large Language Models Play Games? A Case Study of A Self-Play Approach
Figure 4 for Can Large Language Models Play Games? A Case Study of A Self-Play Approach
Viaarxiv icon

Measuring and Reducing LLM Hallucination without Gold-Standard Answers via Expertise-Weighting

Add code
Feb 16, 2024
Viaarxiv icon

Human-Instruction-Free LLM Self-Alignment with Limited Samples

Add code
Jan 06, 2024
Viaarxiv icon

Reason for Future, Act for Now: A Principled Framework for Autonomous LLM Agents with Provable Sample Efficiency

Add code
Oct 11, 2023
Figure 1 for Reason for Future, Act for Now: A Principled Framework for Autonomous LLM Agents with Provable Sample Efficiency
Figure 2 for Reason for Future, Act for Now: A Principled Framework for Autonomous LLM Agents with Provable Sample Efficiency
Figure 3 for Reason for Future, Act for Now: A Principled Framework for Autonomous LLM Agents with Provable Sample Efficiency
Figure 4 for Reason for Future, Act for Now: A Principled Framework for Autonomous LLM Agents with Provable Sample Efficiency
Viaarxiv icon

Behavior Contrastive Learning for Unsupervised Skill Discovery

Add code
May 08, 2023
Figure 1 for Behavior Contrastive Learning for Unsupervised Skill Discovery
Figure 2 for Behavior Contrastive Learning for Unsupervised Skill Discovery
Figure 3 for Behavior Contrastive Learning for Unsupervised Skill Discovery
Figure 4 for Behavior Contrastive Learning for Unsupervised Skill Discovery
Viaarxiv icon