Picture for Shengyi Huang

Shengyi Huang

TÜLU 3: Pushing Frontiers in Open Language Model Post-Training

Add code
Nov 22, 2024
Viaarxiv icon

Asynchronous RLHF: Faster and More Efficient Off-Policy RL for Language Models

Add code
Oct 23, 2024
Figure 1 for Asynchronous RLHF: Faster and More Efficient Off-Policy RL for Language Models
Figure 2 for Asynchronous RLHF: Faster and More Efficient Off-Policy RL for Language Models
Figure 3 for Asynchronous RLHF: Faster and More Efficient Off-Policy RL for Language Models
Figure 4 for Asynchronous RLHF: Faster and More Efficient Off-Policy RL for Language Models
Viaarxiv icon

The N+ Implementation Details of RLHF with PPO: A Case Study on TL;DR Summarization

Add code
Mar 24, 2024
Figure 1 for The N+ Implementation Details of RLHF with PPO: A Case Study on TL;DR Summarization
Figure 2 for The N+ Implementation Details of RLHF with PPO: A Case Study on TL;DR Summarization
Figure 3 for The N+ Implementation Details of RLHF with PPO: A Case Study on TL;DR Summarization
Figure 4 for The N+ Implementation Details of RLHF with PPO: A Case Study on TL;DR Summarization
Viaarxiv icon

Open RL Benchmark: Comprehensive Tracked Experiments for Reinforcement Learning

Add code
Feb 05, 2024
Figure 1 for Open RL Benchmark: Comprehensive Tracked Experiments for Reinforcement Learning
Figure 2 for Open RL Benchmark: Comprehensive Tracked Experiments for Reinforcement Learning
Figure 3 for Open RL Benchmark: Comprehensive Tracked Experiments for Reinforcement Learning
Figure 4 for Open RL Benchmark: Comprehensive Tracked Experiments for Reinforcement Learning
Viaarxiv icon

Reward Scale Robustness for Proximal Policy Optimization via DreamerV3 Tricks

Add code
Oct 26, 2023
Figure 1 for Reward Scale Robustness for Proximal Policy Optimization via DreamerV3 Tricks
Figure 2 for Reward Scale Robustness for Proximal Policy Optimization via DreamerV3 Tricks
Figure 3 for Reward Scale Robustness for Proximal Policy Optimization via DreamerV3 Tricks
Figure 4 for Reward Scale Robustness for Proximal Policy Optimization via DreamerV3 Tricks
Viaarxiv icon

Zephyr: Direct Distillation of LM Alignment

Add code
Oct 25, 2023
Viaarxiv icon

Cleanba: A Reproducible and Efficient Distributed Reinforcement Learning Platform

Add code
Sep 29, 2023
Figure 1 for Cleanba: A Reproducible and Efficient Distributed Reinforcement Learning Platform
Figure 2 for Cleanba: A Reproducible and Efficient Distributed Reinforcement Learning Platform
Figure 3 for Cleanba: A Reproducible and Efficient Distributed Reinforcement Learning Platform
Figure 4 for Cleanba: A Reproducible and Efficient Distributed Reinforcement Learning Platform
Viaarxiv icon

EnvPool: A Highly Parallel Reinforcement Learning Environment Execution Engine

Add code
Jun 21, 2022
Figure 1 for EnvPool: A Highly Parallel Reinforcement Learning Environment Execution Engine
Figure 2 for EnvPool: A Highly Parallel Reinforcement Learning Environment Execution Engine
Figure 3 for EnvPool: A Highly Parallel Reinforcement Learning Environment Execution Engine
Figure 4 for EnvPool: A Highly Parallel Reinforcement Learning Environment Execution Engine
Viaarxiv icon

A2C is a special case of PPO

Add code
May 18, 2022
Figure 1 for A2C is a special case of PPO
Viaarxiv icon

CleanRL: High-quality Single-file Implementations of Deep Reinforcement Learning Algorithms

Add code
Nov 16, 2021
Figure 1 for CleanRL: High-quality Single-file Implementations of Deep Reinforcement Learning Algorithms
Figure 2 for CleanRL: High-quality Single-file Implementations of Deep Reinforcement Learning Algorithms
Viaarxiv icon