Picture for Yikun Ban

Yikun Ban

CDRRM: Contrast-Driven Rubric Generation for Reliable and Interpretable Reward Modeling

Add code
Mar 09, 2026
Viaarxiv icon

Heterogeneous Agent Collaborative Reinforcement Learning

Add code
Mar 03, 2026
Viaarxiv icon

UniFAR: A Unified Facet-Aware Retrieval Framework for Scientific Documents

Add code
Feb 27, 2026
Viaarxiv icon

UniARM: Towards a Unified Autoregressive Reward Model for Multi-Objective Test-Time Alignment

Add code
Feb 10, 2026
Viaarxiv icon

Does Your Reasoning Model Implicitly Know When to Stop Thinking?

Add code
Feb 09, 2026
Viaarxiv icon

Weak-Driven Learning: How Weak Agents make Strong Agents Stronger

Add code
Feb 09, 2026
Viaarxiv icon

Contextual Rollout Bandits for Reinforcement Learning with Verifiable Rewards

Add code
Feb 09, 2026
Viaarxiv icon

Real-Time Aligned Reward Model beyond Semantics

Add code
Jan 30, 2026
Viaarxiv icon

Your Group-Relative Advantage Is Biased

Add code
Jan 13, 2026
Viaarxiv icon

Scoring, Reasoning, and Selecting the Best! Ensembling Large Language Models via a Peer-Review Process

Add code
Dec 29, 2025
Viaarxiv icon