Picture for Zhipeng Wang

Zhipeng Wang

Rosetta Memory: Adaptive Memory for Cross-LLM Agents

Add code
Jun 05, 2026
Viaarxiv icon

S-SPPO: Semantic-Calibrated Self-Play Preference Optimization

Add code
Jun 01, 2026
Viaarxiv icon

Mags-RL: Wearing Multimodal LLMs a Magnifying Glass via Agentic Reinforcement Learning For Complex Scene Reasoning

Add code
May 27, 2026
Viaarxiv icon

OphIn-500K: Curating Web-Scale Visual Instructions for Scaling Ophthalmic Multimodal Large Language Models

Add code
May 27, 2026
Viaarxiv icon

Beyond GRPO and On-Policy Distillation: An Empirical Sparse-to-Dense Reward Principle for Language-Model Post-Training

Add code
May 14, 2026
Viaarxiv icon

TIP: Token Importance in On-Policy Distillation

Add code
Apr 15, 2026
Viaarxiv icon

Your Agent is More Brittle Than You Think: Uncovering Indirect Injection Vulnerabilities in Agentic LLMs

Add code
Apr 04, 2026
Viaarxiv icon

Bridging Restoration and Diagnosis: A Comprehensive Benchmark for Retinal Fundus Enhancement

Add code
Apr 04, 2026
Viaarxiv icon

SODA: Semi On-Policy Black-Box Distillation for Large Language Models

Add code
Apr 04, 2026
Viaarxiv icon

Robust Batch-Level Query Routing for Large Language Models under Cost and Capacity Constraints

Add code
Mar 25, 2026
Viaarxiv icon