Picture for Zhipeng Wang

Zhipeng Wang

Mags-RL: Wearing Multimodal LLMs a Magnifying Glass via Agentic Reinforcement Learning For Complex Scene Reasoning

Add code
May 27, 2026
Viaarxiv icon

OphIn-500K: Curating Web-Scale Visual Instructions for Scaling Ophthalmic Multimodal Large Language Models

Add code
May 27, 2026
Viaarxiv icon

Beyond GRPO and On-Policy Distillation: An Empirical Sparse-to-Dense Reward Principle for Language-Model Post-Training

Add code
May 14, 2026
Viaarxiv icon

TIP: Token Importance in On-Policy Distillation

Add code
Apr 15, 2026
Viaarxiv icon

Your Agent is More Brittle Than You Think: Uncovering Indirect Injection Vulnerabilities in Agentic LLMs

Add code
Apr 04, 2026
Viaarxiv icon

SODA: Semi On-Policy Black-Box Distillation for Large Language Models

Add code
Apr 04, 2026
Viaarxiv icon

Bridging Restoration and Diagnosis: A Comprehensive Benchmark for Retinal Fundus Enhancement

Add code
Apr 04, 2026
Viaarxiv icon

Robust Batch-Level Query Routing for Large Language Models under Cost and Capacity Constraints

Add code
Mar 25, 2026
Viaarxiv icon

PACED: Distillation and Self-Distillation at the Frontier of Student Competence

Add code
Mar 16, 2026
Viaarxiv icon

PACED: Distillation at the Frontier of Student Competence

Add code
Mar 11, 2026
Viaarxiv icon