Picture for Yunzhong He

Yunzhong He

Rubric-Guided Self-Distillation: Post-Training Without Rubric Verifiers

Add code
Jun 10, 2026
Viaarxiv icon

Not Every Rubric Teaches Equally: Policy-Aware Rubric Rewards for RLVR

Add code
May 19, 2026
Viaarxiv icon

Reward Hacking in Rubric-Based Reinforcement Learning

Add code
May 12, 2026
Viaarxiv icon

Agentic Rubrics as Contextual Verifiers for SWE Agents

Add code
Jan 07, 2026
Viaarxiv icon

Audio MultiChallenge: A Multi-Turn Evaluation of Spoken Dialogue Systems on Natural Human Interaction

Add code
Dec 16, 2025
Figure 1 for Audio MultiChallenge: A Multi-Turn Evaluation of Spoken Dialogue Systems on Natural Human Interaction
Figure 2 for Audio MultiChallenge: A Multi-Turn Evaluation of Spoken Dialogue Systems on Natural Human Interaction
Figure 3 for Audio MultiChallenge: A Multi-Turn Evaluation of Spoken Dialogue Systems on Natural Human Interaction
Figure 4 for Audio MultiChallenge: A Multi-Turn Evaluation of Spoken Dialogue Systems on Natural Human Interaction
Viaarxiv icon

PRBench: Large-Scale Expert Rubrics for Evaluating High-Stakes Professional Reasoning

Add code
Nov 14, 2025
Viaarxiv icon

Beyond Seeing: Evaluating Multimodal LLMs on Tool-Enabled Image Perception, Transformation, and Reasoning

Add code
Oct 14, 2025
Viaarxiv icon

Online Rubrics Elicitation from Pairwise Comparisons

Add code
Oct 08, 2025
Figure 1 for Online Rubrics Elicitation from Pairwise Comparisons
Figure 2 for Online Rubrics Elicitation from Pairwise Comparisons
Figure 3 for Online Rubrics Elicitation from Pairwise Comparisons
Figure 4 for Online Rubrics Elicitation from Pairwise Comparisons
Viaarxiv icon

Auto-GPT for Online Decision Making: Benchmarks and Additional Opinions

Add code
Jun 04, 2023
Viaarxiv icon

HierCat: Hierarchical Query Categorization from Weakly Supervised Data at Facebook Marketplace

Add code
Feb 22, 2023
Viaarxiv icon