Picture for Andrea Michi

Andrea Michi

Conditioned Language Policy: A General Framework for Steerable Multi-Objective Finetuning

Add code
Jul 22, 2024
Figure 1 for Conditioned Language Policy: A General Framework for Steerable Multi-Objective Finetuning
Figure 2 for Conditioned Language Policy: A General Framework for Steerable Multi-Objective Finetuning
Figure 3 for Conditioned Language Policy: A General Framework for Steerable Multi-Objective Finetuning
Figure 4 for Conditioned Language Policy: A General Framework for Steerable Multi-Objective Finetuning
Viaarxiv icon

BOND: Aligning LLMs with Best-of-N Distillation

Add code
Jul 19, 2024
Figure 1 for BOND: Aligning LLMs with Best-of-N Distillation
Figure 2 for BOND: Aligning LLMs with Best-of-N Distillation
Figure 3 for BOND: Aligning LLMs with Best-of-N Distillation
Figure 4 for BOND: Aligning LLMs with Best-of-N Distillation
Viaarxiv icon

Nash Learning from Human Feedback

Add code
Dec 06, 2023
Viaarxiv icon

Towards practical reinforcement learning for tokamak magnetic control

Add code
Jul 21, 2023
Figure 1 for Towards practical reinforcement learning for tokamak magnetic control
Figure 2 for Towards practical reinforcement learning for tokamak magnetic control
Figure 3 for Towards practical reinforcement learning for tokamak magnetic control
Figure 4 for Towards practical reinforcement learning for tokamak magnetic control
Viaarxiv icon

Hyperparameter Selection for Offline Reinforcement Learning

Add code
Jul 17, 2020
Figure 1 for Hyperparameter Selection for Offline Reinforcement Learning
Figure 2 for Hyperparameter Selection for Offline Reinforcement Learning
Figure 3 for Hyperparameter Selection for Offline Reinforcement Learning
Figure 4 for Hyperparameter Selection for Offline Reinforcement Learning
Viaarxiv icon