Picture for Bingbing Xu

Bingbing Xu

CAS Key Laboratory of AI Safety, Institute of Computing Technology, CAS, Beijing, China, Tsinghua University, Beijing, China

Towards Robust Process Reward Modeling via Noise-aware Learning

Add code
Jan 19, 2026
Viaarxiv icon

R$^2$PO: Decoupling Training Trajectories from Inference Responses for LLM Reasoning

Add code
Jan 17, 2026
Viaarxiv icon

Do We Always Need Query-Level Workflows? Rethinking Agentic Workflow Generation for Multi-Agent Systems

Add code
Jan 16, 2026
Viaarxiv icon

ROMA: Real-time Omni-Multimodal Assistant with Interactive Streaming Understanding

Add code
Jan 15, 2026
Viaarxiv icon

Chain-of-Memory: Lightweight Memory Construction with Dynamic Evolution for LLM Agents

Add code
Jan 14, 2026
Viaarxiv icon

Beyond Entangled Planning: Task-Decoupled Planning for Long-Horizon Agents

Add code
Jan 12, 2026
Viaarxiv icon

GIFT: Games as Informal Training for Generalizable LLMs

Add code
Jan 09, 2026
Viaarxiv icon

HAG: Hierarchical Demographic Tree-based Agent Generation for Topic-Adaptive Simulation

Add code
Jan 09, 2026
Viaarxiv icon

Learning from Mistakes: Negative Reasoning Samples Enhance Out-of-Domain Generalization

Add code
Jan 08, 2026
Viaarxiv icon

From Outcomes to Processes: Guiding PRM Learning from ORM for Inference-Time Alignment

Add code
Jun 14, 2025
Figure 1 for From Outcomes to Processes: Guiding PRM Learning from ORM for Inference-Time Alignment
Figure 2 for From Outcomes to Processes: Guiding PRM Learning from ORM for Inference-Time Alignment
Figure 3 for From Outcomes to Processes: Guiding PRM Learning from ORM for Inference-Time Alignment
Figure 4 for From Outcomes to Processes: Guiding PRM Learning from ORM for Inference-Time Alignment
Viaarxiv icon