Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:AppAgent v2: Advanced Agent for Flexible Mobile Interactions

Aug 05, 2024

Yanda Li, Chi Zhang, Wanqi Yang, Bin Fu, Pei Cheng, Xin Chen, Ling Chen, Yunchao Wei

Figure 1 for AppAgent v2: Advanced Agent for Flexible Mobile Interactions

Figure 2 for AppAgent v2: Advanced Agent for Flexible Mobile Interactions

Figure 3 for AppAgent v2: Advanced Agent for Flexible Mobile Interactions

Figure 4 for AppAgent v2: Advanced Agent for Flexible Mobile Interactions

Share this with someone who'll enjoy it:

Abstract:With the advancement of Multimodal Large Language Models (MLLM), LLM-driven visual agents are increasingly impacting software interfaces, particularly those with graphical user interfaces. This work introduces a novel LLM-based multimodal agent framework for mobile devices. This framework, capable of navigating mobile devices, emulates human-like interactions. Our agent constructs a flexible action space that enhances adaptability across various applications including parser, text and vision descriptions. The agent operates through two main phases: exploration and deployment. During the exploration phase, functionalities of user interface elements are documented either through agent-driven or manual explorations into a customized structured knowledge base. In the deployment phase, RAG technology enables efficient retrieval and update from this knowledge base, thereby empowering the agent to perform tasks effectively and accurately. This includes performing complex, multi-step operations across various applications, thereby demonstrating the framework's adaptability and precision in handling customized task workflows. Our experimental results across various benchmarks demonstrate the framework's superior performance, confirming its effectiveness in real-world scenarios. Our code will be open source soon.

View paper on

Share this with someone who'll enjoy it:

Title:AppAgent v2: Advanced Agent for Flexible Mobile Interactions

Paper and Code