Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Tianle Dai

META-GUI: Towards Multi-modal Conversational Agents on Mobile GUI

May 23, 2022

Liangtai Sun, Xingyu Chen, Lu Chen, Tianle Dai, Zichen Zhu, Kai Yu

Figure 1 for META-GUI: Towards Multi-modal Conversational Agents on Mobile GUI

Figure 2 for META-GUI: Towards Multi-modal Conversational Agents on Mobile GUI

Figure 3 for META-GUI: Towards Multi-modal Conversational Agents on Mobile GUI

Figure 4 for META-GUI: Towards Multi-modal Conversational Agents on Mobile GUI

Abstract:Task-oriented dialogue (TOD) systems have been widely used by mobile phone intelligent assistants to accomplish tasks such as calendar scheduling or hotel booking. Current TOD systems usually focus on multi-turn text/speech interaction and reply on calling back-end APIs to search database information or execute the task on mobile phone. However, this architecture greatly limits the information searching capability of intelligent assistants and may even lead to task failure if APIs are not available or the task is too complicated to be executed by the provided APIs. In this paper, we propose a new TOD architecture: GUI-based task-oriented dialogue system (GUI-TOD). A GUI-TOD system can directly perform GUI operations on real APPs and execute tasks without invoking backend APIs. Furthermore, we release META-GUI, a dataset for training a Multi-modal conversational agent on mobile GUI. We also propose a multi-model action prediction and response model. It showed promising results on META-GUI, but there is still room for further improvement. The dataset and models will be publicly available.

* 14 pages, 10 figures

Via

Access Paper or Ask Questions