Picture for Yash Chandak

Yash Chandak

Short-Long Policy Evaluation with Novel Actions

Add code
Jul 04, 2024
Viaarxiv icon

Contrastive Policy Gradient: Aligning LLMs on sequence-level scores in a supervised-friendly fashion

Add code
Jun 27, 2024
Viaarxiv icon

Averaging log-likelihoods in direct alignment

Add code
Jun 27, 2024
Viaarxiv icon

OPERA: Automatic Offline Policy Evaluation with Re-weighted Aggregates of Multiple Estimators

Add code
May 27, 2024
Viaarxiv icon

A/B testing under Interference with Partial Network Information

Add code
Apr 16, 2024
Figure 1 for A/B testing under Interference with Partial Network Information
Figure 2 for A/B testing under Interference with Partial Network Information
Figure 3 for A/B testing under Interference with Partial Network Information
Figure 4 for A/B testing under Interference with Partial Network Information
Viaarxiv icon

Adaptive Instrument Design for Indirect Experiments

Add code
Dec 05, 2023
Figure 1 for Adaptive Instrument Design for Indirect Experiments
Figure 2 for Adaptive Instrument Design for Indirect Experiments
Figure 3 for Adaptive Instrument Design for Indirect Experiments
Figure 4 for Adaptive Instrument Design for Indirect Experiments
Viaarxiv icon

Behavior Alignment via Reward Function Optimization

Add code
Oct 31, 2023
Viaarxiv icon

Supervised Pretraining Can Learn In-Context Reinforcement Learning

Add code
Jun 26, 2023
Viaarxiv icon

Coagent Networks: Generalized and Scaled

Add code
May 16, 2023
Figure 1 for Coagent Networks: Generalized and Scaled
Figure 2 for Coagent Networks: Generalized and Scaled
Figure 3 for Coagent Networks: Generalized and Scaled
Figure 4 for Coagent Networks: Generalized and Scaled
Viaarxiv icon

Representations and Exploration for Deep Reinforcement Learning using Singular Value Decomposition

Add code
May 02, 2023
Viaarxiv icon