Picture for Peng Shi

Peng Shi

CAIRO: Decoupling Order from Scale in Regression

Add code
Feb 16, 2026
Viaarxiv icon

TreeCUA: Efficiently Scaling GUI Automation with Tree-Structured Verifiable Evolution

Add code
Feb 10, 2026
Viaarxiv icon

Flexible Entropy Control in RLVR with Gradient-Preserving Perspective

Add code
Feb 10, 2026
Viaarxiv icon

Length-Unbiased Sequence Policy Optimization: Revealing and Controlling Response Length Variation in RLVR

Add code
Feb 05, 2026
Viaarxiv icon

MobileDreamer: Generative Sketch World Model for GUI Agent

Add code
Jan 07, 2026
Viaarxiv icon

Learning When to Look: A Disentangled Curriculum for Strategic Perception in Multimodal Reasoning

Add code
Dec 19, 2025
Viaarxiv icon

FutureWeaver: Planning Test-Time Compute for Multi-Agent Systems with Modularized Collaboration

Add code
Dec 12, 2025
Figure 1 for FutureWeaver: Planning Test-Time Compute for Multi-Agent Systems with Modularized Collaboration
Figure 2 for FutureWeaver: Planning Test-Time Compute for Multi-Agent Systems with Modularized Collaboration
Figure 3 for FutureWeaver: Planning Test-Time Compute for Multi-Agent Systems with Modularized Collaboration
Figure 4 for FutureWeaver: Planning Test-Time Compute for Multi-Agent Systems with Modularized Collaboration
Viaarxiv icon

Metis-HOME: Hybrid Optimized Mixture-of-Experts for Multimodal Reasoning

Add code
Oct 23, 2025
Viaarxiv icon

Metis-RISE: RL Incentivizes and SFT Enhances Multimodal Reasoning Model Learning

Add code
Jun 16, 2025
Viaarxiv icon

HRScene: How Far Are VLMs from Effective High-Resolution Image Understanding?

Add code
Apr 29, 2025
Figure 1 for HRScene: How Far Are VLMs from Effective High-Resolution Image Understanding?
Figure 2 for HRScene: How Far Are VLMs from Effective High-Resolution Image Understanding?
Figure 3 for HRScene: How Far Are VLMs from Effective High-Resolution Image Understanding?
Figure 4 for HRScene: How Far Are VLMs from Effective High-Resolution Image Understanding?
Viaarxiv icon