Picture for Xing Yu

Xing Yu

REDSearcher: A Scalable and Cost-Efficient Framework for Long-Horizon Search Agents

Add code
Feb 15, 2026
Viaarxiv icon

VESPO: Variational Sequence-Level Soft Policy Optimization for Stable Off-Policy LLM Training

Add code
Feb 11, 2026
Viaarxiv icon

DeepEyesV2: Toward Agentic Multimodal Model

Add code
Nov 10, 2025
Viaarxiv icon

Towards Agentic Self-Learning LLMs in Search Environment

Add code
Oct 16, 2025
Figure 1 for Towards Agentic Self-Learning LLMs in Search Environment
Figure 2 for Towards Agentic Self-Learning LLMs in Search Environment
Figure 3 for Towards Agentic Self-Learning LLMs in Search Environment
Figure 4 for Towards Agentic Self-Learning LLMs in Search Environment
Viaarxiv icon

DeepEyes: Incentivizing "Thinking with Images" via Reinforcement Learning

Add code
May 20, 2025
Viaarxiv icon

Less is More: Enhancing Structured Multi-Agent Reasoning via Quality-Guided Distillation

Add code
Apr 23, 2025
Viaarxiv icon

Think When You Need: Self-Adaptive Chain-of-Thought Learning

Add code
Apr 04, 2025
Figure 1 for Think When You Need: Self-Adaptive Chain-of-Thought Learning
Figure 2 for Think When You Need: Self-Adaptive Chain-of-Thought Learning
Figure 3 for Think When You Need: Self-Adaptive Chain-of-Thought Learning
Figure 4 for Think When You Need: Self-Adaptive Chain-of-Thought Learning
Viaarxiv icon

Probabilistic Uncertain Reward Model: A Natural Generalization of Bradley-Terry Reward Model

Add code
Mar 28, 2025
Figure 1 for Probabilistic Uncertain Reward Model: A Natural Generalization of Bradley-Terry Reward Model
Figure 2 for Probabilistic Uncertain Reward Model: A Natural Generalization of Bradley-Terry Reward Model
Figure 3 for Probabilistic Uncertain Reward Model: A Natural Generalization of Bradley-Terry Reward Model
Figure 4 for Probabilistic Uncertain Reward Model: A Natural Generalization of Bradley-Terry Reward Model
Viaarxiv icon

Cheems: A Practical Guidance for Building and Evaluating Chinese Reward Models from Scratch

Add code
Feb 24, 2025
Figure 1 for Cheems: A Practical Guidance for Building and Evaluating Chinese Reward Models from Scratch
Figure 2 for Cheems: A Practical Guidance for Building and Evaluating Chinese Reward Models from Scratch
Figure 3 for Cheems: A Practical Guidance for Building and Evaluating Chinese Reward Models from Scratch
Figure 4 for Cheems: A Practical Guidance for Building and Evaluating Chinese Reward Models from Scratch
Viaarxiv icon

Rethinking Reward Model Evaluation: Are We Barking up the Wrong Tree?

Add code
Oct 08, 2024
Figure 1 for Rethinking Reward Model Evaluation: Are We Barking up the Wrong Tree?
Figure 2 for Rethinking Reward Model Evaluation: Are We Barking up the Wrong Tree?
Figure 3 for Rethinking Reward Model Evaluation: Are We Barking up the Wrong Tree?
Figure 4 for Rethinking Reward Model Evaluation: Are We Barking up the Wrong Tree?
Viaarxiv icon