Picture for Subhojyoti Mukherjee

Subhojyoti Mukherjee

Multi-Objective Alignment of Large Language Models Through Hypervolume Maximization

Add code
Dec 06, 2024
Viaarxiv icon

Off-Policy Evaluation from Logged Human Feedback

Add code
Jun 14, 2024
Figure 1 for Off-Policy Evaluation from Logged Human Feedback
Figure 2 for Off-Policy Evaluation from Logged Human Feedback
Figure 3 for Off-Policy Evaluation from Logged Human Feedback
Figure 4 for Off-Policy Evaluation from Logged Human Feedback
Viaarxiv icon

Pretraining Decision Transformers with Reward Prediction for In-Context Multi-task Structured Bandit Learning

Add code
Jun 07, 2024
Viaarxiv icon

SaVeR: Optimal Data Collection Strategy for Safe Policy Evaluation in Tabular MDP

Add code
Jun 04, 2024
Viaarxiv icon

Optimal Design for Human Feedback

Add code
Apr 22, 2024
Viaarxiv icon

Experimental Design for Active Transductive Inference in Large Language Models

Add code
Apr 12, 2024
Viaarxiv icon

Multi-task Representation Learning for Pure Exploration in Bilinear Bandits

Add code
Nov 01, 2023
Viaarxiv icon

Efficient and Interpretable Bandit Algorithms

Add code
Oct 23, 2023
Viaarxiv icon

SPEED: Experimental Design for Policy Evaluation in Linear Heteroscedastic Bandits

Add code
Jan 29, 2023
Viaarxiv icon

Safety Aware Changepoint Detection for Piecewise i.i.d. Bandits

Add code
May 27, 2022
Figure 1 for Safety Aware Changepoint Detection for Piecewise i.i.d. Bandits
Figure 2 for Safety Aware Changepoint Detection for Piecewise i.i.d. Bandits
Figure 3 for Safety Aware Changepoint Detection for Piecewise i.i.d. Bandits
Figure 4 for Safety Aware Changepoint Detection for Piecewise i.i.d. Bandits
Viaarxiv icon