Picture for Arian Hosseini

Arian Hosseini

Asynchronous RLHF: Faster and More Efficient Off-Policy RL for Language Models

Add code
Oct 23, 2024
Figure 1 for Asynchronous RLHF: Faster and More Efficient Off-Policy RL for Language Models
Figure 2 for Asynchronous RLHF: Faster and More Efficient Off-Policy RL for Language Models
Figure 3 for Asynchronous RLHF: Faster and More Efficient Off-Policy RL for Language Models
Figure 4 for Asynchronous RLHF: Faster and More Efficient Off-Policy RL for Language Models
Viaarxiv icon

Not All LLM Reasoners Are Created Equal

Add code
Oct 02, 2024
Figure 1 for Not All LLM Reasoners Are Created Equal
Figure 2 for Not All LLM Reasoners Are Created Equal
Figure 3 for Not All LLM Reasoners Are Created Equal
Figure 4 for Not All LLM Reasoners Are Created Equal
Viaarxiv icon

Smaller, Weaker, Yet Better: Training LLM Reasoners via Compute-Optimal Sampling

Add code
Aug 29, 2024
Figure 1 for Smaller, Weaker, Yet Better: Training LLM Reasoners via Compute-Optimal Sampling
Figure 2 for Smaller, Weaker, Yet Better: Training LLM Reasoners via Compute-Optimal Sampling
Figure 3 for Smaller, Weaker, Yet Better: Training LLM Reasoners via Compute-Optimal Sampling
Figure 4 for Smaller, Weaker, Yet Better: Training LLM Reasoners via Compute-Optimal Sampling
Viaarxiv icon

Generative Verifiers: Reward Modeling as Next-Token Prediction

Add code
Aug 27, 2024
Figure 1 for Generative Verifiers: Reward Modeling as Next-Token Prediction
Figure 2 for Generative Verifiers: Reward Modeling as Next-Token Prediction
Figure 3 for Generative Verifiers: Reward Modeling as Next-Token Prediction
Figure 4 for Generative Verifiers: Reward Modeling as Next-Token Prediction
Viaarxiv icon

The N+ Implementation Details of RLHF with PPO: A Case Study on TL;DR Summarization

Add code
Mar 24, 2024
Figure 1 for The N+ Implementation Details of RLHF with PPO: A Case Study on TL;DR Summarization
Figure 2 for The N+ Implementation Details of RLHF with PPO: A Case Study on TL;DR Summarization
Figure 3 for The N+ Implementation Details of RLHF with PPO: A Case Study on TL;DR Summarization
Figure 4 for The N+ Implementation Details of RLHF with PPO: A Case Study on TL;DR Summarization
Viaarxiv icon

V-STaR: Training Verifiers for Self-Taught Reasoners

Add code
Feb 09, 2024
Figure 1 for V-STaR: Training Verifiers for Self-Taught Reasoners
Figure 2 for V-STaR: Training Verifiers for Self-Taught Reasoners
Figure 3 for V-STaR: Training Verifiers for Self-Taught Reasoners
Figure 4 for V-STaR: Training Verifiers for Self-Taught Reasoners
Viaarxiv icon

Deep Language Networks: Joint Prompt Training of Stacked LLMs using Variational Inference

Add code
Jun 21, 2023
Viaarxiv icon

On the Compositional Generalization Gap of In-Context Learning

Add code
Nov 15, 2022
Viaarxiv icon

Understanding by Understanding Not: Modeling Negation in Language Models

Add code
May 07, 2021
Figure 1 for Understanding by Understanding Not: Modeling Negation in Language Models
Figure 2 for Understanding by Understanding Not: Modeling Negation in Language Models
Figure 3 for Understanding by Understanding Not: Modeling Negation in Language Models
Figure 4 for Understanding by Understanding Not: Modeling Negation in Language Models
Viaarxiv icon

Ordered Memory

Add code
Nov 03, 2019
Figure 1 for Ordered Memory
Figure 2 for Ordered Memory
Figure 3 for Ordered Memory
Figure 4 for Ordered Memory
Viaarxiv icon