Picture for Shihan Dou

Shihan Dou

EVPO: Explained Variance Policy Optimization for Adaptive Critic Utilization in LLM Post-Training

Add code
Apr 21, 2026
Viaarxiv icon

Reward Hacking in the Era of Large Models: Mechanisms, Emergent Misalignment, Challenges

Add code
Apr 15, 2026
Viaarxiv icon

Enhancing LLM-based Search Agents via Contribution Weighted Group Relative Policy Optimization

Add code
Apr 15, 2026
Viaarxiv icon

MM-Doc-R1: Training Agents for Long Document Visual Question Answering through Multi-turn Reinforcement Learning

Add code
Apr 15, 2026
Viaarxiv icon

A Decomposition Perspective to Long-context Reasoning for LLMs

Add code
Apr 09, 2026
Viaarxiv icon

JFTA-Bench: Evaluate LLM's Ability of Tracking and Analyzing Malfunctions Using Fault Trees

Add code
Mar 24, 2026
Viaarxiv icon

Probing How Scalable Table Data Enhances General Long-Context Reasoning

Add code
Mar 23, 2026
Viaarxiv icon

SciAgentGym: Benchmarking Multi-Step Scientific Tool-use in LLM Agents

Add code
Feb 13, 2026
Viaarxiv icon

DFPO: Scaling Value Modeling via Distributional Flow towards Robust and Generalizable LLM Post-Training

Add code
Feb 05, 2026
Viaarxiv icon

Steering LLMs via Scalable Interactive Oversight

Add code
Feb 04, 2026
Viaarxiv icon