Picture for Aviral Kumar

Aviral Kumar

Scaling Test-Time Compute Without Verification or RL is Suboptimal

Add code
Feb 18, 2025
Viaarxiv icon

Value-Based Deep RL Scales Predictably

Add code
Feb 06, 2025
Viaarxiv icon

Inference-Aware Fine-Tuning for Best-of-N Sampling in Large Language Models

Add code
Dec 18, 2024
Viaarxiv icon

Efficient Online Reinforcement Learning Fine-Tuning Need Not Retain Offline Data

Add code
Dec 10, 2024
Viaarxiv icon

Policy Agnostic RL: Offline RL and Online RL Fine-Tuning of Any Class and Backbone

Add code
Dec 09, 2024
Viaarxiv icon

What Do Learning Dynamics Reveal About Generalization in LLM Reasoning?

Add code
Nov 12, 2024
Figure 1 for What Do Learning Dynamics Reveal About Generalization in LLM Reasoning?
Figure 2 for What Do Learning Dynamics Reveal About Generalization in LLM Reasoning?
Figure 3 for What Do Learning Dynamics Reveal About Generalization in LLM Reasoning?
Figure 4 for What Do Learning Dynamics Reveal About Generalization in LLM Reasoning?
Viaarxiv icon

Steering Your Generalists: Improving Robotic Foundation Models via Value Guidance

Add code
Oct 17, 2024
Figure 1 for Steering Your Generalists: Improving Robotic Foundation Models via Value Guidance
Figure 2 for Steering Your Generalists: Improving Robotic Foundation Models via Value Guidance
Figure 3 for Steering Your Generalists: Improving Robotic Foundation Models via Value Guidance
Figure 4 for Steering Your Generalists: Improving Robotic Foundation Models via Value Guidance
Viaarxiv icon

Rewarding Progress: Scaling Automated Process Verifiers for LLM Reasoning

Add code
Oct 10, 2024
Figure 1 for Rewarding Progress: Scaling Automated Process Verifiers for LLM Reasoning
Figure 2 for Rewarding Progress: Scaling Automated Process Verifiers for LLM Reasoning
Figure 3 for Rewarding Progress: Scaling Automated Process Verifiers for LLM Reasoning
Figure 4 for Rewarding Progress: Scaling Automated Process Verifiers for LLM Reasoning
Viaarxiv icon

Generative Verifiers: Reward Modeling as Next-Token Prediction

Add code
Aug 27, 2024
Figure 1 for Generative Verifiers: Reward Modeling as Next-Token Prediction
Figure 2 for Generative Verifiers: Reward Modeling as Next-Token Prediction
Figure 3 for Generative Verifiers: Reward Modeling as Next-Token Prediction
Figure 4 for Generative Verifiers: Reward Modeling as Next-Token Prediction
Viaarxiv icon

D5RL: Diverse Datasets for Data-Driven Deep Reinforcement Learning

Add code
Aug 15, 2024
Viaarxiv icon