Picture for Chengzhuo Ni

Chengzhuo Ni

Diffusion Model for Data-Driven Black-Box Optimization

Add code
Mar 20, 2024
Viaarxiv icon

Reward-Directed Conditional Diffusion: Provable Distribution Estimation and Reward Improvement

Add code
Jul 13, 2023
Viaarxiv icon

Representation Learning for General-sum Low-rank Markov Games

Add code
Oct 30, 2022
Viaarxiv icon

Bandit Theory and Thompson Sampling-Guided Directed Evolution for Sequence Optimization

Add code
Jun 05, 2022
Figure 1 for Bandit Theory and Thompson Sampling-Guided Directed Evolution for Sequence Optimization
Figure 2 for Bandit Theory and Thompson Sampling-Guided Directed Evolution for Sequence Optimization
Figure 3 for Bandit Theory and Thompson Sampling-Guided Directed Evolution for Sequence Optimization
Figure 4 for Bandit Theory and Thompson Sampling-Guided Directed Evolution for Sequence Optimization
Viaarxiv icon

Off-Policy Fitted Q-Evaluation with Differentiable Function Approximators: Z-Estimation and Inference Theory

Add code
Feb 10, 2022
Figure 1 for Off-Policy Fitted Q-Evaluation with Differentiable Function Approximators: Z-Estimation and Inference Theory
Viaarxiv icon

Optimal Estimation of Off-Policy Policy Gradient via Double Fitted Iteration

Add code
Jan 31, 2022
Figure 1 for Optimal Estimation of Off-Policy Policy Gradient via Double Fitted Iteration
Figure 2 for Optimal Estimation of Off-Policy Policy Gradient via Double Fitted Iteration
Figure 3 for Optimal Estimation of Off-Policy Policy Gradient via Double Fitted Iteration
Figure 4 for Optimal Estimation of Off-Policy Policy Gradient via Double Fitted Iteration
Viaarxiv icon

Learning Good State and Action Representations via Tensor Decomposition

Add code
May 03, 2021
Figure 1 for Learning Good State and Action Representations via Tensor Decomposition
Figure 2 for Learning Good State and Action Representations via Tensor Decomposition
Figure 3 for Learning Good State and Action Representations via Tensor Decomposition
Figure 4 for Learning Good State and Action Representations via Tensor Decomposition
Viaarxiv icon

On the Convergence and Sample Efficiency of Variance-Reduced Policy Gradient Method

Add code
Feb 17, 2021
Figure 1 for On the Convergence and Sample Efficiency of Variance-Reduced Policy Gradient Method
Viaarxiv icon

Learning to Control in Metric Space with Optimal Regret

Add code
May 05, 2019
Viaarxiv icon