Picture for Tuo Zhao

Tuo Zhao

Teach Diffusion Language Models to Learn from Their Own Mistakes

Add code
Jan 10, 2026
Viaarxiv icon

Ask a Strong LLM Judge when Your Reward Model is Uncertain

Add code
Oct 23, 2025
Viaarxiv icon

OpenRubrics: Towards Scalable Synthetic Rubric Generation for Reward Modeling and LLM Alignment

Add code
Oct 09, 2025
Figure 1 for OpenRubrics: Towards Scalable Synthetic Rubric Generation for Reward Modeling and LLM Alignment
Figure 2 for OpenRubrics: Towards Scalable Synthetic Rubric Generation for Reward Modeling and LLM Alignment
Figure 3 for OpenRubrics: Towards Scalable Synthetic Rubric Generation for Reward Modeling and LLM Alignment
Figure 4 for OpenRubrics: Towards Scalable Synthetic Rubric Generation for Reward Modeling and LLM Alignment
Viaarxiv icon

Think-RM: Enabling Long-Horizon Reasoning in Generative Reward Models

Add code
May 22, 2025
Viaarxiv icon

NoWag: A Unified Framework for Shape Preserving Compression of Large Language Models

Add code
Apr 20, 2025
Viaarxiv icon

Adversarial Training of Reward Models

Add code
Apr 08, 2025
Figure 1 for Adversarial Training of Reward Models
Figure 2 for Adversarial Training of Reward Models
Figure 3 for Adversarial Training of Reward Models
Figure 4 for Adversarial Training of Reward Models
Viaarxiv icon

IDEA Prune: An Integrated Enlarge-and-Prune Pipeline in Generative Language Model Pretraining

Add code
Mar 07, 2025
Viaarxiv icon

LLMs Can Generate a Better Answer by Aggregating Their Own Responses

Add code
Mar 06, 2025
Viaarxiv icon

A Minimalist Example of Edge-of-Stability and Progressive Sharpening

Add code
Mar 04, 2025
Figure 1 for A Minimalist Example of Edge-of-Stability and Progressive Sharpening
Figure 2 for A Minimalist Example of Edge-of-Stability and Progressive Sharpening
Figure 3 for A Minimalist Example of Edge-of-Stability and Progressive Sharpening
Figure 4 for A Minimalist Example of Edge-of-Stability and Progressive Sharpening
Viaarxiv icon

COSMOS: A Hybrid Adaptive Optimizer for Memory-Efficient Training of LLMs

Add code
Feb 26, 2025
Viaarxiv icon