Picture for Rahul Jain

Rahul Jain

Reduce Computational Cost In Deep Reinforcement Learning Via Randomized Policy Learning

Add code
May 25, 2025
Viaarxiv icon

Multi-Objective Preference Optimization: Improving Human Alignment of Generative Models

Add code
May 16, 2025
Viaarxiv icon

Distributionally Robust Direct Preference Optimization

Add code
Feb 04, 2025
Figure 1 for Distributionally Robust Direct Preference Optimization
Figure 2 for Distributionally Robust Direct Preference Optimization
Figure 3 for Distributionally Robust Direct Preference Optimization
Figure 4 for Distributionally Robust Direct Preference Optimization
Viaarxiv icon

Best Policy Learning from Trajectory Preference Feedback

Add code
Jan 31, 2025
Figure 1 for Best Policy Learning from Trajectory Preference Feedback
Figure 2 for Best Policy Learning from Trajectory Preference Feedback
Figure 3 for Best Policy Learning from Trajectory Preference Feedback
Figure 4 for Best Policy Learning from Trajectory Preference Feedback
Viaarxiv icon

Markov Balance Satisfaction Improves Performance in Strictly Batch Offline Imitation Learning

Add code
Aug 17, 2024
Figure 1 for Markov Balance Satisfaction Improves Performance in Strictly Batch Offline Imitation Learning
Figure 2 for Markov Balance Satisfaction Improves Performance in Strictly Batch Offline Imitation Learning
Figure 3 for Markov Balance Satisfaction Improves Performance in Strictly Batch Offline Imitation Learning
Figure 4 for Markov Balance Satisfaction Improves Performance in Strictly Batch Offline Imitation Learning
Viaarxiv icon

Online Bandit Learning with Offline Preference Data

Add code
Jun 13, 2024
Viaarxiv icon

e-COP : Episodic Constrained Optimization of Policies

Add code
Jun 13, 2024
Viaarxiv icon

Pure Exploration for Constrained Best Mixed Arm Identification with a Fixed Budget

Add code
May 23, 2024
Figure 1 for Pure Exploration for Constrained Best Mixed Arm Identification with a Fixed Budget
Figure 2 for Pure Exploration for Constrained Best Mixed Arm Identification with a Fixed Budget
Figure 3 for Pure Exploration for Constrained Best Mixed Arm Identification with a Fixed Budget
Figure 4 for Pure Exploration for Constrained Best Mixed Arm Identification with a Fixed Budget
Viaarxiv icon

Efficient Online Learning with Offline Datasets for Infinite Horizon MDPs: A Bayesian Approach

Add code
Oct 17, 2023
Viaarxiv icon

Regret Analysis of the Posterior Sampling-based Learning Algorithm for Episodic POMDPs

Add code
Oct 16, 2023
Figure 1 for Regret Analysis of the Posterior Sampling-based Learning Algorithm for Episodic POMDPs
Figure 2 for Regret Analysis of the Posterior Sampling-based Learning Algorithm for Episodic POMDPs
Figure 3 for Regret Analysis of the Posterior Sampling-based Learning Algorithm for Episodic POMDPs
Viaarxiv icon