Picture for Sergey Levine

Sergey Levine

Stanford University

Proposer-Agent-Evaluator(PAE): Autonomous Skill Discovery For Foundation Model Internet Agents

Add code
Dec 17, 2024
Viaarxiv icon

RLDG: Robotic Generalist Policy Distillation via Reinforcement Learning

Add code
Dec 13, 2024
Viaarxiv icon

Efficient Online Reinforcement Learning Fine-Tuning Need Not Retain Offline Data

Add code
Dec 10, 2024
Viaarxiv icon

Predicting Emergent Capabilities by Finetuning

Add code
Nov 25, 2024
Figure 1 for Predicting Emergent Capabilities by Finetuning
Figure 2 for Predicting Emergent Capabilities by Finetuning
Figure 3 for Predicting Emergent Capabilities by Finetuning
Figure 4 for Predicting Emergent Capabilities by Finetuning
Viaarxiv icon

What Do Learning Dynamics Reveal About Generalization in LLM Reasoning?

Add code
Nov 12, 2024
Figure 1 for What Do Learning Dynamics Reveal About Generalization in LLM Reasoning?
Figure 2 for What Do Learning Dynamics Reveal About Generalization in LLM Reasoning?
Figure 3 for What Do Learning Dynamics Reveal About Generalization in LLM Reasoning?
Figure 4 for What Do Learning Dynamics Reveal About Generalization in LLM Reasoning?
Viaarxiv icon

Interactive Dialogue Agents via Reinforcement Learning on Hindsight Regenerations

Add code
Nov 07, 2024
Figure 1 for Interactive Dialogue Agents via Reinforcement Learning on Hindsight Regenerations
Figure 2 for Interactive Dialogue Agents via Reinforcement Learning on Hindsight Regenerations
Figure 3 for Interactive Dialogue Agents via Reinforcement Learning on Hindsight Regenerations
Figure 4 for Interactive Dialogue Agents via Reinforcement Learning on Hindsight Regenerations
Viaarxiv icon

Q-SFT: Q-Learning for Language Models via Supervised Fine-Tuning

Add code
Nov 07, 2024
Figure 1 for Q-SFT: Q-Learning for Language Models via Supervised Fine-Tuning
Figure 2 for Q-SFT: Q-Learning for Language Models via Supervised Fine-Tuning
Figure 3 for Q-SFT: Q-Learning for Language Models via Supervised Fine-Tuning
Figure 4 for Q-SFT: Q-Learning for Language Models via Supervised Fine-Tuning
Viaarxiv icon

Learning to Assist Humans without Inferring Rewards

Add code
Nov 04, 2024
Viaarxiv icon

$π_0$: A Vision-Language-Action Flow Model for General Robot Control

Add code
Oct 31, 2024
Figure 1 for $π_0$: A Vision-Language-Action Flow Model for General Robot Control
Figure 2 for $π_0$: A Vision-Language-Action Flow Model for General Robot Control
Figure 3 for $π_0$: A Vision-Language-Action Flow Model for General Robot Control
Figure 4 for $π_0$: A Vision-Language-Action Flow Model for General Robot Control
Viaarxiv icon

Precise and Dexterous Robotic Manipulation via Human-in-the-Loop Reinforcement Learning

Add code
Oct 29, 2024
Viaarxiv icon