Picture for Miao Lu

Miao Lu

Provably Mitigating Overoptimization in RLHF: Your SFT Loss is Implicitly an Adversarial Regularizer

Add code
May 26, 2024
Figure 1 for Provably Mitigating Overoptimization in RLHF: Your SFT Loss is Implicitly an Adversarial Regularizer
Figure 2 for Provably Mitigating Overoptimization in RLHF: Your SFT Loss is Implicitly an Adversarial Regularizer
Figure 3 for Provably Mitigating Overoptimization in RLHF: Your SFT Loss is Implicitly an Adversarial Regularizer
Figure 4 for Provably Mitigating Overoptimization in RLHF: Your SFT Loss is Implicitly an Adversarial Regularizer
Viaarxiv icon

Distributionally Robust Reinforcement Learning with Interactive Data Collection: Fundamental Hardness and Near-Optimal Algorithm

Add code
Apr 04, 2024
Viaarxiv icon

Benign Oscillation of Stochastic Gradient Descent with Large Learning Rates

Add code
Oct 26, 2023
Viaarxiv icon

One Objective to Rule Them All: A Maximization Objective Fusing Estimation and Planning for Exploration

Add code
May 29, 2023
Viaarxiv icon

Double Pessimism is Provably Efficient for Distributionally Robust Offline Reinforcement Learning: Generic Algorithm and Robust Partial Coverage

Add code
May 16, 2023
Viaarxiv icon

Robust Consensus Clustering and its Applications for Advertising Forecasting

Add code
Dec 27, 2022
Viaarxiv icon

Video Background Music Generation: Dataset, Method and Evaluation

Add code
Nov 21, 2022
Viaarxiv icon

Statistical Estimation of Confounded Linear MDPs: An Instrumental Variable Approach

Add code
Sep 12, 2022
Viaarxiv icon

Pessimism in the Face of Confounders: Provably Efficient Offline Reinforcement Learning in Partially Observable Markov Decision Processes

Add code
May 26, 2022
Figure 1 for Pessimism in the Face of Confounders: Provably Efficient Offline Reinforcement Learning in Partially Observable Markov Decision Processes
Figure 2 for Pessimism in the Face of Confounders: Provably Efficient Offline Reinforcement Learning in Partially Observable Markov Decision Processes
Figure 3 for Pessimism in the Face of Confounders: Provably Efficient Offline Reinforcement Learning in Partially Observable Markov Decision Processes
Figure 4 for Pessimism in the Face of Confounders: Provably Efficient Offline Reinforcement Learning in Partially Observable Markov Decision Processes
Viaarxiv icon

GEN-VLKT: Simplify Association and Enhance Interaction Understanding for HOI Detection

Add code
Apr 14, 2022
Figure 1 for GEN-VLKT: Simplify Association and Enhance Interaction Understanding for HOI Detection
Figure 2 for GEN-VLKT: Simplify Association and Enhance Interaction Understanding for HOI Detection
Figure 3 for GEN-VLKT: Simplify Association and Enhance Interaction Understanding for HOI Detection
Figure 4 for GEN-VLKT: Simplify Association and Enhance Interaction Understanding for HOI Detection
Viaarxiv icon