Picture for Xing Yu

Xing Yu

Vision-OPD: Learning to See Fine Details for Multimodal LLMs via On-Policy Self-Distillation

Add code
May 19, 2026
Viaarxiv icon

Learning from Failures: Correction-Oriented Policy Optimization with Verifiable Rewards

Add code
May 14, 2026
Viaarxiv icon

From Generic Correlation to Input-Specific Credit in On-Policy Self Distillation

Add code
May 12, 2026
Viaarxiv icon

Anti-Self-Distillation for Reasoning RL via Pointwise Mutual Information

Add code
May 12, 2026
Viaarxiv icon

Tackling Length Inflation Without Trade-offs: Group Relative Reward Rescaling for Reinforcement Learning

Add code
Mar 11, 2026
Viaarxiv icon

Bootstrapping Exploration with Group-Level Natural Language Feedback in Reinforcement Learning

Add code
Mar 04, 2026
Viaarxiv icon

REDSearcher: A Scalable and Cost-Efficient Framework for Long-Horizon Search Agents

Add code
Feb 15, 2026
Viaarxiv icon

VESPO: Variational Sequence-Level Soft Policy Optimization for Stable Off-Policy LLM Training

Add code
Feb 11, 2026
Viaarxiv icon

DeepEyesV2: Toward Agentic Multimodal Model

Add code
Nov 10, 2025
Viaarxiv icon

Towards Agentic Self-Learning LLMs in Search Environment

Add code
Oct 16, 2025
Figure 1 for Towards Agentic Self-Learning LLMs in Search Environment
Figure 2 for Towards Agentic Self-Learning LLMs in Search Environment
Figure 3 for Towards Agentic Self-Learning LLMs in Search Environment
Figure 4 for Towards Agentic Self-Learning LLMs in Search Environment
Viaarxiv icon