https://youtu.be/gKV6KZYwxGY. The code repository is available at https://github.com/X-LANCE/Mobile-Env. The proposed WikiHow task set is made public at https://huggingface.co/datasets/zdy023/WikiHow-taskset.
The interaction platform plays a crucial role in the recent advancement of the control and decision domains like game playing and embodied intelligence. However, there is still a lack of a satisfactory platform for the information user interface (InfoUI) interaction. The proposed InfoUI comprises not only the plain text information, but the multimodal contents and a few spatial structures with styles as well. To help the research of InfoUI interaction, a novel platform Mobile-Env is presented in this paper. The Mobile-Env platform is designed to be flexible, adaptable, and easily-extended. Based on Mobile-Env, an InfoUI task set is then built for a demonstration and evaluation. An agent based on the large-scale language model (LLM) is tested on the task set. The experiment results demonstrate the great potential of the LLM to do text understanding and matching and, meanwhile, reveal the necessity of a better mechanism of interaction feedback and exploration. Several new discussions are conducted as well. A demo video is available at