Picture for Bobak Shahriari

Bobak Shahriari

Capturing Individual Human Preferences with Reward Features

Add code
Mar 21, 2025
Viaarxiv icon

Preference Optimization as Probabilistic Inference

Add code
Oct 05, 2024
Viaarxiv icon

Gemma 2: Improving Open Language Models at a Practical Size

Add code
Aug 02, 2024
Figure 1 for Gemma 2: Improving Open Language Models at a Practical Size
Figure 2 for Gemma 2: Improving Open Language Models at a Practical Size
Figure 3 for Gemma 2: Improving Open Language Models at a Practical Size
Figure 4 for Gemma 2: Improving Open Language Models at a Practical Size
Viaarxiv icon

Gemma: Open Models Based on Gemini Research and Technology

Add code
Mar 13, 2024
Figure 1 for Gemma: Open Models Based on Gemini Research and Technology
Figure 2 for Gemma: Open Models Based on Gemini Research and Technology
Figure 3 for Gemma: Open Models Based on Gemini Research and Technology
Figure 4 for Gemma: Open Models Based on Gemini Research and Technology
Viaarxiv icon

Knowledge Transfer from Teachers to Learners in Growing-Batch Reinforcement Learning

Add code
May 09, 2023
Figure 1 for Knowledge Transfer from Teachers to Learners in Growing-Batch Reinforcement Learning
Figure 2 for Knowledge Transfer from Teachers to Learners in Growing-Batch Reinforcement Learning
Figure 3 for Knowledge Transfer from Teachers to Learners in Growing-Batch Reinforcement Learning
Figure 4 for Knowledge Transfer from Teachers to Learners in Growing-Batch Reinforcement Learning
Viaarxiv icon

Revisiting Gaussian mixture critics in off-policy reinforcement learning: a sample-based approach

Add code
Apr 22, 2022
Figure 1 for Revisiting Gaussian mixture critics in off-policy reinforcement learning: a sample-based approach
Figure 2 for Revisiting Gaussian mixture critics in off-policy reinforcement learning: a sample-based approach
Figure 3 for Revisiting Gaussian mixture critics in off-policy reinforcement learning: a sample-based approach
Figure 4 for Revisiting Gaussian mixture critics in off-policy reinforcement learning: a sample-based approach
Viaarxiv icon

On Multi-objective Policy Optimization as a Tool for Reinforcement Learning

Add code
Jun 15, 2021
Figure 1 for On Multi-objective Policy Optimization as a Tool for Reinforcement Learning
Figure 2 for On Multi-objective Policy Optimization as a Tool for Reinforcement Learning
Figure 3 for On Multi-objective Policy Optimization as a Tool for Reinforcement Learning
Figure 4 for On Multi-objective Policy Optimization as a Tool for Reinforcement Learning
Viaarxiv icon

Critic Regularized Regression

Add code
Jun 26, 2020
Figure 1 for Critic Regularized Regression
Figure 2 for Critic Regularized Regression
Figure 3 for Critic Regularized Regression
Figure 4 for Critic Regularized Regression
Viaarxiv icon

Acme: A Research Framework for Distributed Reinforcement Learning

Add code
Jun 01, 2020
Figure 1 for Acme: A Research Framework for Distributed Reinforcement Learning
Figure 2 for Acme: A Research Framework for Distributed Reinforcement Learning
Figure 3 for Acme: A Research Framework for Distributed Reinforcement Learning
Figure 4 for Acme: A Research Framework for Distributed Reinforcement Learning
Viaarxiv icon

Making Efficient Use of Demonstrations to Solve Hard Exploration Problems

Add code
Sep 03, 2019
Figure 1 for Making Efficient Use of Demonstrations to Solve Hard Exploration Problems
Figure 2 for Making Efficient Use of Demonstrations to Solve Hard Exploration Problems
Figure 3 for Making Efficient Use of Demonstrations to Solve Hard Exploration Problems
Figure 4 for Making Efficient Use of Demonstrations to Solve Hard Exploration Problems
Viaarxiv icon