Picture for Hengquan Guo

Hengquan Guo

Enhancing Safety in Reinforcement Learning with Human Feedback via Rectified Policy Optimization

Add code
Oct 25, 2024
Figure 1 for Enhancing Safety in Reinforcement Learning with Human Feedback via Rectified Policy Optimization
Figure 2 for Enhancing Safety in Reinforcement Learning with Human Feedback via Rectified Policy Optimization
Figure 3 for Enhancing Safety in Reinforcement Learning with Human Feedback via Rectified Policy Optimization
Figure 4 for Enhancing Safety in Reinforcement Learning with Human Feedback via Rectified Policy Optimization
Viaarxiv icon

Learning to Schedule Online Tasks with Bandit Feedback

Add code
Feb 26, 2024
Viaarxiv icon

Rectified Pessimistic-Optimistic Learning for Stochastic Continuum-armed Bandit with Constraints

Add code
Nov 29, 2022
Figure 1 for Rectified Pessimistic-Optimistic Learning for Stochastic Continuum-armed Bandit with Constraints
Figure 2 for Rectified Pessimistic-Optimistic Learning for Stochastic Continuum-armed Bandit with Constraints
Figure 3 for Rectified Pessimistic-Optimistic Learning for Stochastic Continuum-armed Bandit with Constraints
Figure 4 for Rectified Pessimistic-Optimistic Learning for Stochastic Continuum-armed Bandit with Constraints
Viaarxiv icon