Picture for Wentao Zhang

Wentao Zhang

Native Visual Understanding: Resolving Resolution Dilemmas in Vision-Language Models

Add code
Jun 15, 2025
Viaarxiv icon

AgentOrchestra: A Hierarchical Multi-Agent Framework for General-Purpose Task Solving

Add code
Jun 14, 2025
Viaarxiv icon

Learning What Reinforcement Learning Can't: Interleaved Online Fine-Tuning for Hardest Questions

Add code
Jun 09, 2025
Viaarxiv icon

Hallucination at a Glance: Controlled Visual Edits and Fine-Grained Multimodal Learning

Add code
Jun 08, 2025
Viaarxiv icon

Multi-Step Visual Reasoning with Visual Tokens Scaling and Verification

Add code
Jun 08, 2025
Viaarxiv icon

Perceive Anything: Recognize, Explain, Caption, and Segment Anything in Images and Videos

Add code
Jun 05, 2025
Viaarxiv icon

LogicPuzzleRL: Cultivating Robust Mathematical Reasoning in LLMs via Reinforcement Learning

Add code
Jun 05, 2025
Viaarxiv icon

From Chat Logs to Collective Insights: Aggregative Question Answering

Add code
May 29, 2025
Viaarxiv icon

ID-Align: RoPE-Conscious Position Remapping for Dynamic High-Resolution Adaptation in Vision-Language Models

Add code
May 27, 2025
Viaarxiv icon

STRAP: Spatio-Temporal Pattern Retrieval for Out-of-Distribution Generalization

Add code
May 26, 2025
Viaarxiv icon