Picture for Weizhe Yuan

Weizhe Yuan

Bridging Offline and Online Reinforcement Learning for LLMs

Add code
Jun 26, 2025
Viaarxiv icon

An Overview of Large Language Models for Statisticians

Add code
Feb 25, 2025
Viaarxiv icon

NaturalReasoning: Reasoning in the Wild with 2.8M Challenging Questions

Add code
Feb 18, 2025
Viaarxiv icon

R.I.P.: Better Models by Survival of the Fittest Prompts

Add code
Jan 30, 2025
Figure 1 for R.I.P.: Better Models by Survival of the Fittest Prompts
Figure 2 for R.I.P.: Better Models by Survival of the Fittest Prompts
Figure 3 for R.I.P.: Better Models by Survival of the Fittest Prompts
Figure 4 for R.I.P.: Better Models by Survival of the Fittest Prompts
Viaarxiv icon

O1 Replication Journey -- Part 2: Surpassing O1-preview through Simple Distillation, Big Progress or Bitter Lesson?

Add code
Nov 25, 2024
Viaarxiv icon

Self-Consistency Preference Optimization

Add code
Nov 06, 2024
Figure 1 for Self-Consistency Preference Optimization
Figure 2 for Self-Consistency Preference Optimization
Figure 3 for Self-Consistency Preference Optimization
Figure 4 for Self-Consistency Preference Optimization
Viaarxiv icon

Thinking LLMs: General Instruction Following with Thought Generation

Add code
Oct 14, 2024
Figure 1 for Thinking LLMs: General Instruction Following with Thought Generation
Figure 2 for Thinking LLMs: General Instruction Following with Thought Generation
Figure 3 for Thinking LLMs: General Instruction Following with Thought Generation
Figure 4 for Thinking LLMs: General Instruction Following with Thought Generation
Viaarxiv icon

Self-Taught Evaluators

Add code
Aug 05, 2024
Figure 1 for Self-Taught Evaluators
Figure 2 for Self-Taught Evaluators
Figure 3 for Self-Taught Evaluators
Figure 4 for Self-Taught Evaluators
Viaarxiv icon

Meta-Rewarding Language Models: Self-Improving Alignment with LLM-as-a-Meta-Judge

Add code
Jul 28, 2024
Figure 1 for Meta-Rewarding Language Models: Self-Improving Alignment with LLM-as-a-Meta-Judge
Figure 2 for Meta-Rewarding Language Models: Self-Improving Alignment with LLM-as-a-Meta-Judge
Figure 3 for Meta-Rewarding Language Models: Self-Improving Alignment with LLM-as-a-Meta-Judge
Figure 4 for Meta-Rewarding Language Models: Self-Improving Alignment with LLM-as-a-Meta-Judge
Viaarxiv icon

Following Length Constraints in Instructions

Add code
Jun 25, 2024
Figure 1 for Following Length Constraints in Instructions
Figure 2 for Following Length Constraints in Instructions
Figure 3 for Following Length Constraints in Instructions
Figure 4 for Following Length Constraints in Instructions
Viaarxiv icon