Picture for Zhihan Xiong

Zhihan Xiong

Hybrid Preference Optimization for Alignment: Provably Faster Convergence Rates by Combining Offline Preferences with Online Exploration

Add code
Dec 13, 2024
Viaarxiv icon

Language Model Preference Evaluation with Multiple Weak Evaluators

Add code
Oct 14, 2024
Figure 1 for Language Model Preference Evaluation with Multiple Weak Evaluators
Figure 2 for Language Model Preference Evaluation with Multiple Weak Evaluators
Figure 3 for Language Model Preference Evaluation with Multiple Weak Evaluators
Figure 4 for Language Model Preference Evaluation with Multiple Weak Evaluators
Viaarxiv icon

Dual Approximation Policy Optimization

Add code
Oct 02, 2024
Figure 1 for Dual Approximation Policy Optimization
Figure 2 for Dual Approximation Policy Optimization
Figure 3 for Dual Approximation Policy Optimization
Figure 4 for Dual Approximation Policy Optimization
Viaarxiv icon

A/B Testing and Best-arm Identification for Linear Bandits with Robustness to Non-stationarity

Add code
Jul 27, 2023
Viaarxiv icon

A Black-box Approach for Non-stationary Multi-agent Reinforcement Learning

Add code
Jun 12, 2023
Viaarxiv icon

Offline congestion games: How feedback type affects data coverage requirement

Add code
Oct 24, 2022
Viaarxiv icon

Learning in Congestion Games with Bandit Feedback

Add code
Jun 04, 2022
Figure 1 for Learning in Congestion Games with Bandit Feedback
Viaarxiv icon

Selective Sampling for Online Best-arm Identification

Add code
Nov 02, 2021
Figure 1 for Selective Sampling for Online Best-arm Identification
Figure 2 for Selective Sampling for Online Best-arm Identification
Viaarxiv icon

Randomized Exploration is Near-Optimal for Tabular MDP

Add code
Feb 19, 2021
Figure 1 for Randomized Exploration is Near-Optimal for Tabular MDP
Figure 2 for Randomized Exploration is Near-Optimal for Tabular MDP
Figure 3 for Randomized Exploration is Near-Optimal for Tabular MDP
Viaarxiv icon

Parameterized Indexed Value Function for Efficient Exploration in Reinforcement Learning

Add code
Dec 23, 2019
Figure 1 for Parameterized Indexed Value Function for Efficient Exploration in Reinforcement Learning
Figure 2 for Parameterized Indexed Value Function for Efficient Exploration in Reinforcement Learning
Figure 3 for Parameterized Indexed Value Function for Efficient Exploration in Reinforcement Learning
Figure 4 for Parameterized Indexed Value Function for Efficient Exploration in Reinforcement Learning
Viaarxiv icon