Picture for Subhojyoti Mukherjee

Subhojyoti Mukherjee

Multi-Objective Alignment of Large Language Models Through Hypervolume Maximization

Add code
Dec 06, 2024
Figure 1 for Multi-Objective Alignment of Large Language Models Through Hypervolume Maximization
Figure 2 for Multi-Objective Alignment of Large Language Models Through Hypervolume Maximization
Figure 3 for Multi-Objective Alignment of Large Language Models Through Hypervolume Maximization
Figure 4 for Multi-Objective Alignment of Large Language Models Through Hypervolume Maximization
Viaarxiv icon

Off-Policy Evaluation from Logged Human Feedback

Add code
Jun 14, 2024
Figure 1 for Off-Policy Evaluation from Logged Human Feedback
Figure 2 for Off-Policy Evaluation from Logged Human Feedback
Figure 3 for Off-Policy Evaluation from Logged Human Feedback
Figure 4 for Off-Policy Evaluation from Logged Human Feedback
Viaarxiv icon

Pretraining Decision Transformers with Reward Prediction for In-Context Multi-task Structured Bandit Learning

Add code
Jun 07, 2024
Figure 1 for Pretraining Decision Transformers with Reward Prediction for In-Context Multi-task Structured Bandit Learning
Figure 2 for Pretraining Decision Transformers with Reward Prediction for In-Context Multi-task Structured Bandit Learning
Figure 3 for Pretraining Decision Transformers with Reward Prediction for In-Context Multi-task Structured Bandit Learning
Figure 4 for Pretraining Decision Transformers with Reward Prediction for In-Context Multi-task Structured Bandit Learning
Viaarxiv icon

SaVeR: Optimal Data Collection Strategy for Safe Policy Evaluation in Tabular MDP

Add code
Jun 04, 2024
Figure 1 for SaVeR: Optimal Data Collection Strategy for Safe Policy Evaluation in Tabular MDP
Figure 2 for SaVeR: Optimal Data Collection Strategy for Safe Policy Evaluation in Tabular MDP
Figure 3 for SaVeR: Optimal Data Collection Strategy for Safe Policy Evaluation in Tabular MDP
Figure 4 for SaVeR: Optimal Data Collection Strategy for Safe Policy Evaluation in Tabular MDP
Viaarxiv icon

Optimal Design for Human Feedback

Add code
Apr 22, 2024
Figure 1 for Optimal Design for Human Feedback
Figure 2 for Optimal Design for Human Feedback
Figure 3 for Optimal Design for Human Feedback
Viaarxiv icon

Experimental Design for Active Transductive Inference in Large Language Models

Add code
Apr 12, 2024
Viaarxiv icon

Multi-task Representation Learning for Pure Exploration in Bilinear Bandits

Add code
Nov 01, 2023
Viaarxiv icon

Efficient and Interpretable Bandit Algorithms

Add code
Oct 23, 2023
Viaarxiv icon

SPEED: Experimental Design for Policy Evaluation in Linear Heteroscedastic Bandits

Add code
Jan 29, 2023
Viaarxiv icon

Safety Aware Changepoint Detection for Piecewise i.i.d. Bandits

Add code
May 27, 2022
Figure 1 for Safety Aware Changepoint Detection for Piecewise i.i.d. Bandits
Figure 2 for Safety Aware Changepoint Detection for Piecewise i.i.d. Bandits
Figure 3 for Safety Aware Changepoint Detection for Piecewise i.i.d. Bandits
Figure 4 for Safety Aware Changepoint Detection for Piecewise i.i.d. Bandits
Viaarxiv icon