Picture for Boyi Liu

Boyi Liu

Seed-CTS: Unleashing the Power of Tree Search for Superior Performance in Competitive Coding Tasks

Add code
Dec 17, 2024
Viaarxiv icon

DSTC: Direct Preference Learning with Only Self-Generated Tests and Code to Improve Code LMs

Add code
Nov 20, 2024
Figure 1 for DSTC: Direct Preference Learning with Only Self-Generated Tests and Code to Improve Code LMs
Figure 2 for DSTC: Direct Preference Learning with Only Self-Generated Tests and Code to Improve Code LMs
Figure 3 for DSTC: Direct Preference Learning with Only Self-Generated Tests and Code to Improve Code LMs
Figure 4 for DSTC: Direct Preference Learning with Only Self-Generated Tests and Code to Improve Code LMs
Viaarxiv icon

Reward-Augmented Data Enhances Direct Preference Alignment of LLMs

Add code
Oct 10, 2024
Figure 1 for Reward-Augmented Data Enhances Direct Preference Alignment of LLMs
Figure 2 for Reward-Augmented Data Enhances Direct Preference Alignment of LLMs
Figure 3 for Reward-Augmented Data Enhances Direct Preference Alignment of LLMs
Figure 4 for Reward-Augmented Data Enhances Direct Preference Alignment of LLMs
Viaarxiv icon

BabelBench: An Omni Benchmark for Code-Driven Analysis of Multimodal and Multistructured Data

Add code
Oct 01, 2024
Figure 1 for BabelBench: An Omni Benchmark for Code-Driven Analysis of Multimodal and Multistructured Data
Figure 2 for BabelBench: An Omni Benchmark for Code-Driven Analysis of Multimodal and Multistructured Data
Figure 3 for BabelBench: An Omni Benchmark for Code-Driven Analysis of Multimodal and Multistructured Data
Figure 4 for BabelBench: An Omni Benchmark for Code-Driven Analysis of Multimodal and Multistructured Data
Viaarxiv icon

Provably Mitigating Overoptimization in RLHF: Your SFT Loss is Implicitly an Adversarial Regularizer

Add code
May 26, 2024
Figure 1 for Provably Mitigating Overoptimization in RLHF: Your SFT Loss is Implicitly an Adversarial Regularizer
Figure 2 for Provably Mitigating Overoptimization in RLHF: Your SFT Loss is Implicitly an Adversarial Regularizer
Figure 3 for Provably Mitigating Overoptimization in RLHF: Your SFT Loss is Implicitly an Adversarial Regularizer
Figure 4 for Provably Mitigating Overoptimization in RLHF: Your SFT Loss is Implicitly an Adversarial Regularizer
Viaarxiv icon

$\mathbf{}$-Puzzle: A Cost-Efficient Testbed for Benchmarking Reinforcement Learning Algorithms in Generative Language Model

Add code
Mar 11, 2024
Figure 1 for $\mathbf{}$-Puzzle: A Cost-Efficient Testbed for Benchmarking Reinforcement Learning Algorithms in Generative Language Model
Figure 2 for $\mathbf{}$-Puzzle: A Cost-Efficient Testbed for Benchmarking Reinforcement Learning Algorithms in Generative Language Model
Figure 3 for $\mathbf{}$-Puzzle: A Cost-Efficient Testbed for Benchmarking Reinforcement Learning Algorithms in Generative Language Model
Figure 4 for $\mathbf{}$-Puzzle: A Cost-Efficient Testbed for Benchmarking Reinforcement Learning Algorithms in Generative Language Model
Viaarxiv icon

Double Duality: Variational Primal-Dual Policy Optimization for Constrained Reinforcement Learning

Add code
Feb 16, 2024
Viaarxiv icon

Improving Efficiency of DNN-based Relocalization Module for Autonomous Driving with Server-side Computing

Add code
Dec 01, 2023
Viaarxiv icon

Model-Based Reparameterization Policy Gradient Methods: Theory and Practical Algorithms

Add code
Oct 30, 2023
Viaarxiv icon

Reason for Future, Act for Now: A Principled Framework for Autonomous LLM Agents with Provable Sample Efficiency

Add code
Oct 11, 2023
Viaarxiv icon