Picture for Rafael Rafailov

Rafael Rafailov

Big-Math: A Large-Scale, High-Quality Math Dataset for Reinforcement Learning in Language Models

Add code
Feb 24, 2025
Viaarxiv icon

MJ-VIDEO: Fine-Grained Benchmarking and Rewarding Video Preferences in Video Generation

Add code
Feb 03, 2025
Viaarxiv icon

Towards System 2 Reasoning in LLMs: Learning How to Think With Meta Chain-of-Thought

Add code
Jan 08, 2025
Viaarxiv icon

Collapse or Thrive? Perils and Promises of Synthetic Data in a Self-Generating World

Add code
Oct 22, 2024
Viaarxiv icon

D5RL: Diverse Datasets for Data-Driven Deep Reinforcement Learning

Add code
Aug 15, 2024
Viaarxiv icon

Agent Q: Advanced Reasoning and Learning for Autonomous AI Agents

Add code
Aug 13, 2024
Viaarxiv icon

PERSONA: A Reproducible Testbed for Pluralistic Alignment

Add code
Jul 24, 2024
Viaarxiv icon

MJ-Bench: Is Your Multimodal Reward Model Really a Good Judge for Text-to-Image Generation?

Add code
Jul 05, 2024
Figure 1 for MJ-Bench: Is Your Multimodal Reward Model Really a Good Judge for Text-to-Image Generation?
Figure 2 for MJ-Bench: Is Your Multimodal Reward Model Really a Good Judge for Text-to-Image Generation?
Figure 3 for MJ-Bench: Is Your Multimodal Reward Model Really a Good Judge for Text-to-Image Generation?
Figure 4 for MJ-Bench: Is Your Multimodal Reward Model Really a Good Judge for Text-to-Image Generation?
Viaarxiv icon

OpenVLA: An Open-Source Vision-Language-Action Model

Add code
Jun 13, 2024
Figure 1 for OpenVLA: An Open-Source Vision-Language-Action Model
Figure 2 for OpenVLA: An Open-Source Vision-Language-Action Model
Figure 3 for OpenVLA: An Open-Source Vision-Language-Action Model
Figure 4 for OpenVLA: An Open-Source Vision-Language-Action Model
Viaarxiv icon

Scalable Ensembling For Mitigating Reward Overoptimisation

Add code
Jun 03, 2024
Figure 1 for Scalable Ensembling For Mitigating Reward Overoptimisation
Figure 2 for Scalable Ensembling For Mitigating Reward Overoptimisation
Figure 3 for Scalable Ensembling For Mitigating Reward Overoptimisation
Figure 4 for Scalable Ensembling For Mitigating Reward Overoptimisation
Viaarxiv icon