Picture for Michael Noukhovitch

Michael Noukhovitch

Asynchronous RLHF: Faster and More Efficient Off-Policy RL for Language Models

Add code
Oct 23, 2024
Figure 1 for Asynchronous RLHF: Faster and More Efficient Off-Policy RL for Language Models
Figure 2 for Asynchronous RLHF: Faster and More Efficient Off-Policy RL for Language Models
Figure 3 for Asynchronous RLHF: Faster and More Efficient Off-Policy RL for Language Models
Figure 4 for Asynchronous RLHF: Faster and More Efficient Off-Policy RL for Language Models
Viaarxiv icon

The N+ Implementation Details of RLHF with PPO: A Case Study on TL;DR Summarization

Add code
Mar 24, 2024
Figure 1 for The N+ Implementation Details of RLHF with PPO: A Case Study on TL;DR Summarization
Figure 2 for The N+ Implementation Details of RLHF with PPO: A Case Study on TL;DR Summarization
Figure 3 for The N+ Implementation Details of RLHF with PPO: A Case Study on TL;DR Summarization
Figure 4 for The N+ Implementation Details of RLHF with PPO: A Case Study on TL;DR Summarization
Viaarxiv icon

Language Model Alignment with Elastic Reset

Add code
Dec 06, 2023
Viaarxiv icon

Learning to Communicate using Contrastive Learning

Add code
Jul 03, 2023
Viaarxiv icon

Pretraining Representations for Data-Efficient Reinforcement Learning

Add code
Jun 09, 2021
Figure 1 for Pretraining Representations for Data-Efficient Reinforcement Learning
Figure 2 for Pretraining Representations for Data-Efficient Reinforcement Learning
Figure 3 for Pretraining Representations for Data-Efficient Reinforcement Learning
Figure 4 for Pretraining Representations for Data-Efficient Reinforcement Learning
Viaarxiv icon

Emergent Communication under Competition

Add code
Jan 25, 2021
Figure 1 for Emergent Communication under Competition
Figure 2 for Emergent Communication under Competition
Figure 3 for Emergent Communication under Competition
Figure 4 for Emergent Communication under Competition
Viaarxiv icon

Systematic Generalization: What Is Required and Can It Be Learned?

Add code
Nov 30, 2018
Figure 1 for Systematic Generalization: What Is Required and Can It Be Learned?
Figure 2 for Systematic Generalization: What Is Required and Can It Be Learned?
Figure 3 for Systematic Generalization: What Is Required and Can It Be Learned?
Figure 4 for Systematic Generalization: What Is Required and Can It Be Learned?
Viaarxiv icon

Commonsense mining as knowledge base completion? A study on the impact of novelty

Add code
Apr 24, 2018
Figure 1 for Commonsense mining as knowledge base completion? A study on the impact of novelty
Figure 2 for Commonsense mining as knowledge base completion? A study on the impact of novelty
Figure 3 for Commonsense mining as knowledge base completion? A study on the impact of novelty
Figure 4 for Commonsense mining as knowledge base completion? A study on the impact of novelty
Viaarxiv icon