Picture for Jiarui Yao

Jiarui Yao

Supervised Fine-Tuning versus Reinforcement Learning: A Study of Post-Training Methods for Large Language Models

Add code
Mar 14, 2026
Viaarxiv icon

PRL: Process Reward Learning Improves LLMs' Reasoning Ability and Broadens the Reasoning Boundary

Add code
Jan 15, 2026
Viaarxiv icon

ERA: Transforming VLMs into Embodied Agents via Embodied Prior Learning and Online Reinforcement Learning

Add code
Oct 14, 2025
Viaarxiv icon

Why is Your Language Model a Poor Implicit Reward Model?

Add code
Jul 10, 2025
Figure 1 for Why is Your Language Model a Poor Implicit Reward Model?
Figure 2 for Why is Your Language Model a Poor Implicit Reward Model?
Figure 3 for Why is Your Language Model a Poor Implicit Reward Model?
Figure 4 for Why is Your Language Model a Poor Implicit Reward Model?
Viaarxiv icon

MiCRo: Mixture Modeling and Context-aware Routing for Personalized Preference Learning

Add code
May 30, 2025
Viaarxiv icon

Optimizing Chain-of-Thought Reasoners via Gradient Variance Minimization in Rejection Sampling and RL

Add code
May 05, 2025
Figure 1 for Optimizing Chain-of-Thought Reasoners via Gradient Variance Minimization in Rejection Sampling and RL
Figure 2 for Optimizing Chain-of-Thought Reasoners via Gradient Variance Minimization in Rejection Sampling and RL
Figure 3 for Optimizing Chain-of-Thought Reasoners via Gradient Variance Minimization in Rejection Sampling and RL
Figure 4 for Optimizing Chain-of-Thought Reasoners via Gradient Variance Minimization in Rejection Sampling and RL
Viaarxiv icon

A Minimalist Approach to LLM Reasoning: from Rejection Sampling to Reinforce

Add code
Apr 15, 2025
Figure 1 for A Minimalist Approach to LLM Reasoning: from Rejection Sampling to Reinforce
Figure 2 for A Minimalist Approach to LLM Reasoning: from Rejection Sampling to Reinforce
Figure 3 for A Minimalist Approach to LLM Reasoning: from Rejection Sampling to Reinforce
Figure 4 for A Minimalist Approach to LLM Reasoning: from Rejection Sampling to Reinforce
Viaarxiv icon

FANS -- Formal Answer Selection for Natural Language Math Reasoning Using Lean4

Add code
Mar 05, 2025
Figure 1 for FANS -- Formal Answer Selection for Natural Language Math Reasoning Using Lean4
Figure 2 for FANS -- Formal Answer Selection for Natural Language Math Reasoning Using Lean4
Figure 3 for FANS -- Formal Answer Selection for Natural Language Math Reasoning Using Lean4
Figure 4 for FANS -- Formal Answer Selection for Natural Language Math Reasoning Using Lean4
Viaarxiv icon

Rethinking Diverse Human Preference Learning through Principal Component Analysis

Add code
Feb 18, 2025
Viaarxiv icon

EscapeBench: Pushing Language Models to Think Outside the Box

Add code
Dec 18, 2024
Viaarxiv icon