Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Daseul Bae

CAAP: Context-Aware Action Planning Prompting to Solve Computer Tasks with Front-End UI Only

Jun 11, 2024

Junhee Cho, Jihoon Kim, Daseul Bae, Jinho Choo, Youngjune Gwon, Yeong-Dae Kwon

Figure 1 for CAAP: Context-Aware Action Planning Prompting to Solve Computer Tasks with Front-End UI Only

Figure 2 for CAAP: Context-Aware Action Planning Prompting to Solve Computer Tasks with Front-End UI Only

Figure 3 for CAAP: Context-Aware Action Planning Prompting to Solve Computer Tasks with Front-End UI Only

Figure 4 for CAAP: Context-Aware Action Planning Prompting to Solve Computer Tasks with Front-End UI Only

Abstract:Software robots have long been deployed in Robotic Process Automation (RPA) to automate mundane and repetitive computer tasks. The advent of Large Language Models (LLMs) with advanced reasoning capabilities has set the stage for these agents to now undertake more complex and even previously unseen tasks. However, the LLM-based automation techniques in recent literature frequently rely on HTML source codes for input, limiting their application to web environments. Moreover, the information contained in HTML codes is often inaccurate or incomplete, making the agent less reliable for practical applications. We propose an LLM-based agent that functions solely on the basis of screenshots for recognizing environments, while leveraging in-context learning to eliminate the need for collecting large datasets of human demonstration. Our strategy, named Context-Aware Action Planning (CAAP) prompting encourages the agent to meticulously review the context in various angles. Through our proposed methodology, we achieve a success rate of 94.4% on 67~types of MiniWoB++ problems, utilizing only 1.48~demonstrations per problem type. Our method offers the potential for broader applications, especially for tasks that require inter-application coordination on computers or smartphones, showcasing a significant advancement in the field of automation agents. Codes and models are accessible at https://github.com/caap-agent/caap-agent.

* 10 pages, 5 figures; (19 pages and 6 figures more in appendix)

Via

Access Paper or Ask Questions