Picture for Zhihan Xiong

Zhihan Xiong

Language Model Preference Evaluation with Multiple Weak Evaluators

Add code
Oct 14, 2024
Figure 1 for Language Model Preference Evaluation with Multiple Weak Evaluators
Figure 2 for Language Model Preference Evaluation with Multiple Weak Evaluators
Figure 3 for Language Model Preference Evaluation with Multiple Weak Evaluators
Figure 4 for Language Model Preference Evaluation with Multiple Weak Evaluators
Viaarxiv icon

Dual Approximation Policy Optimization

Add code
Oct 02, 2024
Figure 1 for Dual Approximation Policy Optimization
Figure 2 for Dual Approximation Policy Optimization
Figure 3 for Dual Approximation Policy Optimization
Figure 4 for Dual Approximation Policy Optimization
Viaarxiv icon

A/B Testing and Best-arm Identification for Linear Bandits with Robustness to Non-stationarity

Add code
Jul 27, 2023
Viaarxiv icon

A Black-box Approach for Non-stationary Multi-agent Reinforcement Learning

Add code
Jun 12, 2023
Viaarxiv icon

Offline congestion games: How feedback type affects data coverage requirement

Add code
Oct 24, 2022
Viaarxiv icon

Learning in Congestion Games with Bandit Feedback

Add code
Jun 04, 2022
Figure 1 for Learning in Congestion Games with Bandit Feedback
Viaarxiv icon

Selective Sampling for Online Best-arm Identification

Add code
Nov 02, 2021
Figure 1 for Selective Sampling for Online Best-arm Identification
Figure 2 for Selective Sampling for Online Best-arm Identification
Viaarxiv icon

Randomized Exploration is Near-Optimal for Tabular MDP

Add code
Feb 19, 2021
Figure 1 for Randomized Exploration is Near-Optimal for Tabular MDP
Figure 2 for Randomized Exploration is Near-Optimal for Tabular MDP
Figure 3 for Randomized Exploration is Near-Optimal for Tabular MDP
Viaarxiv icon

Parameterized Indexed Value Function for Efficient Exploration in Reinforcement Learning

Add code
Dec 23, 2019
Figure 1 for Parameterized Indexed Value Function for Efficient Exploration in Reinforcement Learning
Figure 2 for Parameterized Indexed Value Function for Efficient Exploration in Reinforcement Learning
Figure 3 for Parameterized Indexed Value Function for Efficient Exploration in Reinforcement Learning
Figure 4 for Parameterized Indexed Value Function for Efficient Exploration in Reinforcement Learning
Viaarxiv icon