Picture for Rafael Rafailov

Rafael Rafailov

Collapse or Thrive? Perils and Promises of Synthetic Data in a Self-Generating World

Add code
Oct 22, 2024
Viaarxiv icon

D5RL: Diverse Datasets for Data-Driven Deep Reinforcement Learning

Add code
Aug 15, 2024
Viaarxiv icon

Agent Q: Advanced Reasoning and Learning for Autonomous AI Agents

Add code
Aug 13, 2024
Viaarxiv icon

PERSONA: A Reproducible Testbed for Pluralistic Alignment

Add code
Jul 24, 2024
Viaarxiv icon

MJ-Bench: Is Your Multimodal Reward Model Really a Good Judge for Text-to-Image Generation?

Add code
Jul 05, 2024
Figure 1 for MJ-Bench: Is Your Multimodal Reward Model Really a Good Judge for Text-to-Image Generation?
Figure 2 for MJ-Bench: Is Your Multimodal Reward Model Really a Good Judge for Text-to-Image Generation?
Figure 3 for MJ-Bench: Is Your Multimodal Reward Model Really a Good Judge for Text-to-Image Generation?
Figure 4 for MJ-Bench: Is Your Multimodal Reward Model Really a Good Judge for Text-to-Image Generation?
Viaarxiv icon

OpenVLA: An Open-Source Vision-Language-Action Model

Add code
Jun 13, 2024
Figure 1 for OpenVLA: An Open-Source Vision-Language-Action Model
Figure 2 for OpenVLA: An Open-Source Vision-Language-Action Model
Figure 3 for OpenVLA: An Open-Source Vision-Language-Action Model
Figure 4 for OpenVLA: An Open-Source Vision-Language-Action Model
Viaarxiv icon

Scalable Ensembling For Mitigating Reward Overoptimisation

Add code
Jun 03, 2024
Figure 1 for Scalable Ensembling For Mitigating Reward Overoptimisation
Figure 2 for Scalable Ensembling For Mitigating Reward Overoptimisation
Figure 3 for Scalable Ensembling For Mitigating Reward Overoptimisation
Figure 4 for Scalable Ensembling For Mitigating Reward Overoptimisation
Viaarxiv icon

Offline Regularised Reinforcement Learning for Large Language Models Alignment

Add code
May 29, 2024
Viaarxiv icon

Efficient Imitation Learning with Conservative World Models

Add code
May 21, 2024
Viaarxiv icon

Preference Fine-Tuning of LLMs Should Leverage Suboptimal, On-Policy Data

Add code
Apr 23, 2024
Figure 1 for Preference Fine-Tuning of LLMs Should Leverage Suboptimal, On-Policy Data
Figure 2 for Preference Fine-Tuning of LLMs Should Leverage Suboptimal, On-Policy Data
Figure 3 for Preference Fine-Tuning of LLMs Should Leverage Suboptimal, On-Policy Data
Figure 4 for Preference Fine-Tuning of LLMs Should Leverage Suboptimal, On-Policy Data
Viaarxiv icon