Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Qinzhuo Wu

BacktrackAgent: Enhancing GUI Agent with Error Detection and Backtracking Mechanism

May 27, 2025

Qinzhuo Wu, Pengzhi Gao, Wei Liu, Jian Luan

Abstract:Graphical User Interface (GUI) agents have gained substantial attention due to their impressive capabilities to complete tasks through multiple interactions within GUI environments. However, existing agents primarily focus on enhancing the accuracy of individual actions and often lack effective mechanisms for detecting and recovering from errors. To address these shortcomings, we propose the BacktrackAgent, a robust framework that incorporates a backtracking mechanism to improve task completion efficiency. BacktrackAgent includes verifier, judger, and reflector components as modules for error detection and recovery, while also applying judgment rewards to further enhance the agent's performance. Additionally, we develop a training dataset specifically designed for the backtracking mechanism, which considers the outcome pages after action executions. Experimental results show that BacktrackAgent has achieved performance improvements in both task success rate and step accuracy on Mobile3M and Auto-UI benchmarks. Our data and code will be released upon acceptance.

Via

Access Paper or Ask Questions

ReachAgent: Enhancing Mobile Agent via Page Reaching and Operation

Feb 05, 2025

Qinzhuo Wu, Wei Liu, Jian Luan, Bin Wang

Figure 1 for ReachAgent: Enhancing Mobile Agent via Page Reaching and Operation

Figure 2 for ReachAgent: Enhancing Mobile Agent via Page Reaching and Operation

Figure 3 for ReachAgent: Enhancing Mobile Agent via Page Reaching and Operation

Figure 4 for ReachAgent: Enhancing Mobile Agent via Page Reaching and Operation

Abstract:Recently, mobile AI agents have gained increasing attention. Given a task, mobile AI agents can interact with mobile devices in multiple steps and finally form a GUI flow that solves the task. However, existing agents tend to focus on most task-relevant elements at each step, leading to local optimal solutions and ignoring the overall GUI flow. To address this issue, we constructed a training dataset called MobileReach, which breaks the task into page reaching and operation subtasks. Furthermore, we propose ReachAgent, a two-stage framework that focuses on improving its task-completion abilities. It utilizes the page reaching and page operation subtasks, along with reward-based preference GUI flows, to further enhance the agent. Experimental results show that ReachAgent significantly improves the IoU Acc and Text Acc by 7.12% and 7.69% on the step-level and 4.72% and 4.63% on the task-level compared to the SOTA agent. Our data and code will be released upon acceptance.

Via

Access Paper or Ask Questions

TextFlint: Unified Multilingual Robustness Evaluation Toolkit for Natural Language Processing

Apr 06, 2021

Tao Gui, Xiao Wang, Qi Zhang, Qin Liu, Yicheng Zou, Xin Zhou, Rui Zheng, Chong Zhang, Qinzhuo Wu, Jiacheng Ye(+24 more)

Figure 1 for TextFlint: Unified Multilingual Robustness Evaluation Toolkit for Natural Language Processing

Figure 2 for TextFlint: Unified Multilingual Robustness Evaluation Toolkit for Natural Language Processing

Figure 3 for TextFlint: Unified Multilingual Robustness Evaluation Toolkit for Natural Language Processing

Figure 4 for TextFlint: Unified Multilingual Robustness Evaluation Toolkit for Natural Language Processing

Abstract:Various robustness evaluation methodologies from different perspectives have been proposed for different natural language processing (NLP) tasks. These methods have often focused on either universal or task-specific generalization capabilities. In this work, we propose a multilingual robustness evaluation platform for NLP tasks (TextFlint) that incorporates universal text transformation, task-specific transformation, adversarial attack, subpopulation, and their combinations to provide comprehensive robustness analysis. TextFlint enables practitioners to automatically evaluate their models from all aspects or to customize their evaluations as desired with just a few lines of code. To guarantee user acceptability, all the text transformations are linguistically based, and we provide a human evaluation for each one. TextFlint generates complete analytical reports as well as targeted augmented data to address the shortcomings of the model's robustness. To validate TextFlint's utility, we performed large-scale empirical evaluations (over 67,000 evaluations) on state-of-the-art deep learning models, classic supervised methods, and real-world systems. Almost all models showed significant performance degradation, including a decline of more than 50% of BERT's prediction accuracy on tasks such as aspect-level sentiment classification, named entity recognition, and natural language inference. Therefore, we call for the robustness to be included in the model evaluation, so as to promote the healthy development of NLP technology.

Via

Access Paper or Ask Questions