Picture for Dian Yu

Dian Yu

Crossing the Reward Bridge: Expanding RL with Verifiable Rewards Across Diverse Domains

Add code
Apr 01, 2025
Viaarxiv icon

Improving LLM General Preference Alignment via Optimistic Online Mirror Descent

Add code
Feb 24, 2025
Viaarxiv icon

Thoughts Are All Over the Place: On the Underthinking of o1-Like LLMs

Add code
Jan 30, 2025
Figure 1 for Thoughts Are All Over the Place: On the Underthinking of o1-Like LLMs
Figure 2 for Thoughts Are All Over the Place: On the Underthinking of o1-Like LLMs
Figure 3 for Thoughts Are All Over the Place: On the Underthinking of o1-Like LLMs
Figure 4 for Thoughts Are All Over the Place: On the Underthinking of o1-Like LLMs
Viaarxiv icon

OpenCharacter: Training Customizable Role-Playing LLMs with Large-Scale Synthetic Personas

Add code
Jan 26, 2025
Figure 1 for OpenCharacter: Training Customizable Role-Playing LLMs with Large-Scale Synthetic Personas
Figure 2 for OpenCharacter: Training Customizable Role-Playing LLMs with Large-Scale Synthetic Personas
Figure 3 for OpenCharacter: Training Customizable Role-Playing LLMs with Large-Scale Synthetic Personas
Figure 4 for OpenCharacter: Training Customizable Role-Playing LLMs with Large-Scale Synthetic Personas
Viaarxiv icon

Do NOT Think That Much for 2+3=? On the Overthinking of o1-Like LLMs

Add code
Dec 30, 2024
Viaarxiv icon

Teaching LLMs to Refine with Tools

Add code
Dec 22, 2024
Figure 1 for Teaching LLMs to Refine with Tools
Figure 2 for Teaching LLMs to Refine with Tools
Figure 3 for Teaching LLMs to Refine with Tools
Figure 4 for Teaching LLMs to Refine with Tools
Viaarxiv icon

Towards Self-Improvement of LLMs via MCTS: Leveraging Stepwise Knowledge with Curriculum Preference Learning

Add code
Oct 09, 2024
Figure 1 for Towards Self-Improvement of LLMs via MCTS: Leveraging Stepwise Knowledge with Curriculum Preference Learning
Figure 2 for Towards Self-Improvement of LLMs via MCTS: Leveraging Stepwise Knowledge with Curriculum Preference Learning
Figure 3 for Towards Self-Improvement of LLMs via MCTS: Leveraging Stepwise Knowledge with Curriculum Preference Learning
Figure 4 for Towards Self-Improvement of LLMs via MCTS: Leveraging Stepwise Knowledge with Curriculum Preference Learning
Viaarxiv icon

DOTS: Learning to Reason Dynamically in LLMs via Optimal Reasoning Trajectories Search

Add code
Oct 04, 2024
Figure 1 for DOTS: Learning to Reason Dynamically in LLMs via Optimal Reasoning Trajectories Search
Figure 2 for DOTS: Learning to Reason Dynamically in LLMs via Optimal Reasoning Trajectories Search
Figure 3 for DOTS: Learning to Reason Dynamically in LLMs via Optimal Reasoning Trajectories Search
Figure 4 for DOTS: Learning to Reason Dynamically in LLMs via Optimal Reasoning Trajectories Search
Viaarxiv icon

SIaM: Self-Improving Code-Assisted Mathematical Reasoning of Large Language Models

Add code
Aug 28, 2024
Figure 1 for SIaM: Self-Improving Code-Assisted Mathematical Reasoning of Large Language Models
Figure 2 for SIaM: Self-Improving Code-Assisted Mathematical Reasoning of Large Language Models
Figure 3 for SIaM: Self-Improving Code-Assisted Mathematical Reasoning of Large Language Models
Figure 4 for SIaM: Self-Improving Code-Assisted Mathematical Reasoning of Large Language Models
Viaarxiv icon

Iterative Nash Policy Optimization: Aligning LLMs with General Preferences via No-Regret Learning

Add code
Jun 30, 2024
Figure 1 for Iterative Nash Policy Optimization: Aligning LLMs with General Preferences via No-Regret Learning
Figure 2 for Iterative Nash Policy Optimization: Aligning LLMs with General Preferences via No-Regret Learning
Figure 3 for Iterative Nash Policy Optimization: Aligning LLMs with General Preferences via No-Regret Learning
Viaarxiv icon