Picture for Dhawal Gupta

Dhawal Gupta

A Safe Exploration Strategy for Model-free Task Adaptation in Safety-constrained Grid Environments

Add code
Aug 02, 2024
Viaarxiv icon

ICU-Sepsis: A Benchmark MDP Built from Real Medical Data

Add code
Jun 09, 2024
Figure 1 for ICU-Sepsis: A Benchmark MDP Built from Real Medical Data
Figure 2 for ICU-Sepsis: A Benchmark MDP Built from Real Medical Data
Figure 3 for ICU-Sepsis: A Benchmark MDP Built from Real Medical Data
Figure 4 for ICU-Sepsis: A Benchmark MDP Built from Real Medical Data
Viaarxiv icon

From Past to Future: Rethinking Eligibility Traces

Add code
Dec 20, 2023
Viaarxiv icon

Behavior Alignment via Reward Function Optimization

Add code
Oct 31, 2023
Viaarxiv icon

Exploring the impact of low-rank adaptation on the performance, efficiency, and regularization of RLHF

Add code
Sep 16, 2023
Figure 1 for Exploring the impact of low-rank adaptation on the performance, efficiency, and regularization of RLHF
Figure 2 for Exploring the impact of low-rank adaptation on the performance, efficiency, and regularization of RLHF
Figure 3 for Exploring the impact of low-rank adaptation on the performance, efficiency, and regularization of RLHF
Figure 4 for Exploring the impact of low-rank adaptation on the performance, efficiency, and regularization of RLHF
Viaarxiv icon

Coagent Networks: Generalized and Scaled

Add code
May 16, 2023
Figure 1 for Coagent Networks: Generalized and Scaled
Figure 2 for Coagent Networks: Generalized and Scaled
Figure 3 for Coagent Networks: Generalized and Scaled
Figure 4 for Coagent Networks: Generalized and Scaled
Viaarxiv icon

Offline Reinforcement Learning for Mixture-of-Expert Dialogue Management

Add code
Feb 21, 2023
Viaarxiv icon

Gradient Temporal-Difference Learning with Regularized Corrections

Add code
Jul 07, 2020
Figure 1 for Gradient Temporal-Difference Learning with Regularized Corrections
Figure 2 for Gradient Temporal-Difference Learning with Regularized Corrections
Figure 3 for Gradient Temporal-Difference Learning with Regularized Corrections
Figure 4 for Gradient Temporal-Difference Learning with Regularized Corrections
Viaarxiv icon