Picture for Yihan Du

Yihan Du

Exploration-Driven Policy Optimization in RLHF: Theoretical Insights on Efficient Data Utilization

Add code
Feb 15, 2024
Figure 1 for Exploration-Driven Policy Optimization in RLHF: Theoretical Insights on Efficient Data Utilization
Figure 2 for Exploration-Driven Policy Optimization in RLHF: Theoretical Insights on Efficient Data Utilization
Figure 3 for Exploration-Driven Policy Optimization in RLHF: Theoretical Insights on Efficient Data Utilization
Viaarxiv icon

Cascading Reinforcement Learning

Add code
Jan 17, 2024
Viaarxiv icon

Provably Efficient Iterated CVaR Reinforcement Learning with Function Approximation

Add code
Jul 06, 2023
Viaarxiv icon

Multi-task Representation Learning for Pure Exploration in Linear Bandits

Add code
Feb 09, 2023
Viaarxiv icon

Dueling Bandits: From Two-dueling to Multi-dueling

Add code
Nov 16, 2022
Viaarxiv icon

Risk-Sensitive Reinforcement Learning: Iterated CVaR and the Worst Path

Add code
Jun 06, 2022
Figure 1 for Risk-Sensitive Reinforcement Learning: Iterated CVaR and the Worst Path
Figure 2 for Risk-Sensitive Reinforcement Learning: Iterated CVaR and the Worst Path
Figure 3 for Risk-Sensitive Reinforcement Learning: Iterated CVaR and the Worst Path
Figure 4 for Risk-Sensitive Reinforcement Learning: Iterated CVaR and the Worst Path
Viaarxiv icon

Branching Reinforcement Learning

Add code
Feb 16, 2022
Figure 1 for Branching Reinforcement Learning
Figure 2 for Branching Reinforcement Learning
Viaarxiv icon

Collaborative Pure Exploration in Kernel Bandit

Add code
Oct 29, 2021
Figure 1 for Collaborative Pure Exploration in Kernel Bandit
Figure 2 for Collaborative Pure Exploration in Kernel Bandit
Viaarxiv icon

Combinatorial Pure Exploration with Bottleneck Reward Function and its Extension to General Reward Functions

Add code
Feb 24, 2021
Figure 1 for Combinatorial Pure Exploration with Bottleneck Reward Function and its Extension to General Reward Functions
Figure 2 for Combinatorial Pure Exploration with Bottleneck Reward Function and its Extension to General Reward Functions
Figure 3 for Combinatorial Pure Exploration with Bottleneck Reward Function and its Extension to General Reward Functions
Figure 4 for Combinatorial Pure Exploration with Bottleneck Reward Function and its Extension to General Reward Functions
Viaarxiv icon

Continuous Mean-Covariance Bandits

Add code
Feb 24, 2021
Figure 1 for Continuous Mean-Covariance Bandits
Figure 2 for Continuous Mean-Covariance Bandits
Figure 3 for Continuous Mean-Covariance Bandits
Viaarxiv icon