Picture for Debmalya Mandal

Debmalya Mandal

Policy Teaching via Data Poisoning in Learning from Human Preferences

Add code
Mar 13, 2025
Viaarxiv icon

Strategyproof Reinforcement Learning from Human Feedback

Add code
Mar 12, 2025
Viaarxiv icon

Surprisingly Popular Voting for Concentric Rank-Order Models

Add code
Nov 13, 2024
Figure 1 for Surprisingly Popular Voting for Concentric Rank-Order Models
Figure 2 for Surprisingly Popular Voting for Concentric Rank-Order Models
Figure 3 for Surprisingly Popular Voting for Concentric Rank-Order Models
Figure 4 for Surprisingly Popular Voting for Concentric Rank-Order Models
Viaarxiv icon

Performative Reinforcement Learning with Linear Markov Decision Process

Add code
Nov 07, 2024
Viaarxiv icon

Symmetric Linear Bandits with Hidden Symmetry

Add code
May 22, 2024
Viaarxiv icon

Reward Model Learning vs. Direct Policy Optimization: A Comparative Analysis of Learning from Human Preferences

Add code
Mar 04, 2024
Viaarxiv icon

Corruption-Robust Offline Two-Player Zero-Sum Markov Games

Add code
Mar 04, 2024
Viaarxiv icon

Performative Reinforcement Learning in Gradually Shifting Environments

Add code
Feb 15, 2024
Viaarxiv icon

Learning the Expected Core of Strictly Convex Stochastic Cooperative Games

Add code
Feb 10, 2024
Viaarxiv icon

Corruption Robust Offline Reinforcement Learning with Human Feedback

Add code
Feb 09, 2024
Viaarxiv icon