Picture for Benyou Wang

Benyou Wang

GameCraft-Bench: Can Agents Build Playable Games End-to-End in a Real Game Engine?

Add code
Jun 16, 2026
Viaarxiv icon

PhoneHarness: Harnessing Phone-Use Agents through Mixed GUI, CLI, and Tool Actions

Add code
Jun 12, 2026
Viaarxiv icon

ProbRes: Volatility Learning for Probabilistic Time-Series Forecasting

Add code
Jun 01, 2026
Viaarxiv icon

PhoneWorld: Scaling Phone-Use Agent Environments

Add code
May 28, 2026
Viaarxiv icon

GlobalDentBench: A Multinational Benchmark for Evaluating LLM Clinical Reasoning in Dentistry with Expert Calibration

Add code
May 26, 2026
Viaarxiv icon

CRPO: Character-centric Group Relative Policy Optimization for Role-aware Reasoning in Role-playing Agents

Add code
May 25, 2026
Viaarxiv icon

HiMed: Incentivizing Hindi Reasoning in Medical LLMs

Add code
May 23, 2026
Viaarxiv icon

Agentifying Patient Dynamics within LLMs through Interacting with Clinical World Model

Add code
May 14, 2026
Viaarxiv icon

Do Phone-Use Agents Respect Your Privacy?

Add code
Apr 02, 2026
Viaarxiv icon

Project Imaging-X: A Survey of 1000+ Open-Access Medical Imaging Datasets for Foundation Model Development

Add code
Mar 29, 2026
Viaarxiv icon