Picture for Deqing Wang

Deqing Wang

Heterogeneous Agent Collaborative Reinforcement Learning

Add code
Mar 03, 2026
Viaarxiv icon

UniFAR: A Unified Facet-Aware Retrieval Framework for Scientific Documents

Add code
Feb 27, 2026
Viaarxiv icon

UniARM: Towards a Unified Autoregressive Reward Model for Multi-Objective Test-Time Alignment

Add code
Feb 10, 2026
Viaarxiv icon

Weak-Driven Learning: How Weak Agents make Strong Agents Stronger

Add code
Feb 09, 2026
Viaarxiv icon

Contextual Rollout Bandits for Reinforcement Learning with Verifiable Rewards

Add code
Feb 09, 2026
Viaarxiv icon

Does Your Reasoning Model Implicitly Know When to Stop Thinking?

Add code
Feb 09, 2026
Viaarxiv icon

Real-Time Aligned Reward Model beyond Semantics

Add code
Jan 30, 2026
Viaarxiv icon

Your Group-Relative Advantage Is Biased

Add code
Jan 13, 2026
Viaarxiv icon

LLMBoost: Make Large Language Models Stronger with Boosting

Add code
Dec 26, 2025
Viaarxiv icon

FLeW: Facet-Level and Adaptive Weighted Representation Learning of Scientific Documents

Add code
Sep 09, 2025
Viaarxiv icon