Picture for Sayak Ray Chowdhury

Sayak Ray Chowdhury

Right Now, Wrong Then: Non-Stationary Direct Preference Optimization under Preference Drift

Add code
Jul 26, 2024
Viaarxiv icon

Provably Robust DPO: Aligning Language Models with Noisy Feedback

Add code
Mar 01, 2024
Viaarxiv icon

Provably Sample Efficient RLHF via Active Preference Optimization

Add code
Feb 16, 2024
Viaarxiv icon

GAR-meets-RAG Paradigm for Zero-Shot Information Retrieval

Add code
Oct 31, 2023
Viaarxiv icon

Differentially Private Reward Estimation with Preference Feedback

Add code
Oct 30, 2023
Viaarxiv icon

Differentially Private Episodic Reinforcement Learning with Heavy-tailed Rewards

Add code
Jun 05, 2023
Viaarxiv icon

On Differentially Private Federated Linear Contextual Bandits

Add code
Feb 27, 2023
Viaarxiv icon

Exploration in Linear Bandits with Rich Action Sets and its Implications for Inference

Add code
Jul 23, 2022
Figure 1 for Exploration in Linear Bandits with Rich Action Sets and its Implications for Inference
Figure 2 for Exploration in Linear Bandits with Rich Action Sets and its Implications for Inference
Figure 3 for Exploration in Linear Bandits with Rich Action Sets and its Implications for Inference
Viaarxiv icon

Model Selection in Reinforcement Learning with General Function Approximations

Add code
Jul 06, 2022
Viaarxiv icon

Distributed Differential Privacy in Multi-Armed Bandits

Add code
Jun 12, 2022
Figure 1 for Distributed Differential Privacy in Multi-Armed Bandits
Figure 2 for Distributed Differential Privacy in Multi-Armed Bandits
Figure 3 for Distributed Differential Privacy in Multi-Armed Bandits
Figure 4 for Distributed Differential Privacy in Multi-Armed Bandits
Viaarxiv icon