Picture for Lior Shani

Lior Shani

Offline Regularised Reinforcement Learning for Large Language Models Alignment

Add code
May 29, 2024
Viaarxiv icon

Embedding-Aligned Language Models

Add code
May 24, 2024
Viaarxiv icon

Multi-turn Reinforcement Learning from Preference Human Feedback

Add code
May 23, 2024
Figure 1 for Multi-turn Reinforcement Learning from Preference Human Feedback
Figure 2 for Multi-turn Reinforcement Learning from Preference Human Feedback
Figure 3 for Multi-turn Reinforcement Learning from Preference Human Feedback
Figure 4 for Multi-turn Reinforcement Learning from Preference Human Feedback
Viaarxiv icon

Demystifying Embedding Spaces using Large Language Models

Add code
Oct 06, 2023
Viaarxiv icon

Factually Consistent Summarization via Reinforcement Learning with Textual Entailment Feedback

Add code
May 31, 2023
Viaarxiv icon

Reinforcement Learning with History-Dependent Dynamic Contexts

Add code
Feb 04, 2023
Viaarxiv icon

Reinforcement Learning with a Terminator

Add code
May 30, 2022
Figure 1 for Reinforcement Learning with a Terminator
Figure 2 for Reinforcement Learning with a Terminator
Figure 3 for Reinforcement Learning with a Terminator
Figure 4 for Reinforcement Learning with a Terminator
Viaarxiv icon

Online Apprenticeship Learning

Add code
Feb 13, 2021
Figure 1 for Online Apprenticeship Learning
Figure 2 for Online Apprenticeship Learning
Figure 3 for Online Apprenticeship Learning
Figure 4 for Online Apprenticeship Learning
Viaarxiv icon

Mirror Descent Policy Optimization

Add code
Jun 09, 2020
Figure 1 for Mirror Descent Policy Optimization
Figure 2 for Mirror Descent Policy Optimization
Figure 3 for Mirror Descent Policy Optimization
Figure 4 for Mirror Descent Policy Optimization
Viaarxiv icon

Optimistic Policy Optimization with Bandit Feedback

Add code
Feb 19, 2020
Figure 1 for Optimistic Policy Optimization with Bandit Feedback
Viaarxiv icon