Picture for Alizée Pace

Alizée Pace

Uncertainty-Penalized Direct Preference Optimization

Add code
Oct 26, 2024
Figure 1 for Uncertainty-Penalized Direct Preference Optimization
Figure 2 for Uncertainty-Penalized Direct Preference Optimization
Figure 3 for Uncertainty-Penalized Direct Preference Optimization
Figure 4 for Uncertainty-Penalized Direct Preference Optimization
Viaarxiv icon

Preference Elicitation for Offline Reinforcement Learning

Add code
Jun 26, 2024
Figure 1 for Preference Elicitation for Offline Reinforcement Learning
Figure 2 for Preference Elicitation for Offline Reinforcement Learning
Figure 3 for Preference Elicitation for Offline Reinforcement Learning
Figure 4 for Preference Elicitation for Offline Reinforcement Learning
Viaarxiv icon

West-of-N: Synthetic Preference Generation for Improved Reward Modeling

Add code
Jan 22, 2024
Figure 1 for West-of-N: Synthetic Preference Generation for Improved Reward Modeling
Figure 2 for West-of-N: Synthetic Preference Generation for Improved Reward Modeling
Figure 3 for West-of-N: Synthetic Preference Generation for Improved Reward Modeling
Figure 4 for West-of-N: Synthetic Preference Generation for Improved Reward Modeling
Viaarxiv icon

On the Importance of Step-wise Embeddings for Heterogeneous Clinical Time-Series

Add code
Nov 15, 2023
Viaarxiv icon

Delphic Offline Reinforcement Learning under Nonidentifiable Hidden Confounding

Add code
Jun 01, 2023
Figure 1 for Delphic Offline Reinforcement Learning under Nonidentifiable Hidden Confounding
Figure 2 for Delphic Offline Reinforcement Learning under Nonidentifiable Hidden Confounding
Figure 3 for Delphic Offline Reinforcement Learning under Nonidentifiable Hidden Confounding
Figure 4 for Delphic Offline Reinforcement Learning under Nonidentifiable Hidden Confounding
Viaarxiv icon

Temporal Label Smoothing for Early Prediction of Adverse Events

Add code
Aug 29, 2022
Figure 1 for Temporal Label Smoothing for Early Prediction of Adverse Events
Figure 2 for Temporal Label Smoothing for Early Prediction of Adverse Events
Figure 3 for Temporal Label Smoothing for Early Prediction of Adverse Events
Figure 4 for Temporal Label Smoothing for Early Prediction of Adverse Events
Viaarxiv icon

POETREE: Interpretable Policy Learning with Adaptive Decision Trees

Add code
Mar 15, 2022
Figure 1 for POETREE: Interpretable Policy Learning with Adaptive Decision Trees
Figure 2 for POETREE: Interpretable Policy Learning with Adaptive Decision Trees
Figure 3 for POETREE: Interpretable Policy Learning with Adaptive Decision Trees
Figure 4 for POETREE: Interpretable Policy Learning with Adaptive Decision Trees
Viaarxiv icon