Picture for Kun Shao

Kun Shao

Hi-Agent: Hierarchical Vision-Language Agents for Mobile Device Control

Add code
Oct 16, 2025
Viaarxiv icon

PerPilot: Personalizing VLM-based Mobile Agents via Memory and Exploration

Add code
Aug 25, 2025
Figure 1 for PerPilot: Personalizing VLM-based Mobile Agents via Memory and Exploration
Figure 2 for PerPilot: Personalizing VLM-based Mobile Agents via Memory and Exploration
Figure 3 for PerPilot: Personalizing VLM-based Mobile Agents via Memory and Exploration
Figure 4 for PerPilot: Personalizing VLM-based Mobile Agents via Memory and Exploration
Viaarxiv icon

Uncertainty-quantified Rollout Policy Adaptation for Unlabelled Cross-domain Temporal Grounding

Add code
Aug 08, 2025
Viaarxiv icon

SpatialViz-Bench: Automatically Generated Spatial Visualization Reasoning Tasks for MLLMs

Add code
Jul 10, 2025
Figure 1 for SpatialViz-Bench: Automatically Generated Spatial Visualization Reasoning Tasks for MLLMs
Figure 2 for SpatialViz-Bench: Automatically Generated Spatial Visualization Reasoning Tasks for MLLMs
Figure 3 for SpatialViz-Bench: Automatically Generated Spatial Visualization Reasoning Tasks for MLLMs
Figure 4 for SpatialViz-Bench: Automatically Generated Spatial Visualization Reasoning Tasks for MLLMs
Viaarxiv icon

AgentSwift: Efficient LLM Agent Design via Value-guided Hierarchical Search

Add code
Jun 06, 2025
Viaarxiv icon

ViMo: A Generative Visual GUI World Model for App Agent

Add code
Apr 15, 2025
Viaarxiv icon

VideoAgent2: Enhancing the LLM-Based Agent System for Long-Form Video Understanding by Uncertainty-Aware CoT

Add code
Apr 06, 2025
Viaarxiv icon

ThinkBench: Dynamic Out-of-Distribution Evaluation for Robust LLM Reasoning

Add code
Feb 22, 2025
Viaarxiv icon

VSC-RL: Advancing Autonomous Vision-Language Agents with Variational Subgoal-Conditioned Reinforcement Learning

Add code
Feb 11, 2025
Viaarxiv icon

AppVLM: A Lightweight Vision Language Model for Online App Control

Add code
Feb 10, 2025
Figure 1 for AppVLM: A Lightweight Vision Language Model for Online App Control
Figure 2 for AppVLM: A Lightweight Vision Language Model for Online App Control
Figure 3 for AppVLM: A Lightweight Vision Language Model for Online App Control
Figure 4 for AppVLM: A Lightweight Vision Language Model for Online App Control
Viaarxiv icon