Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Screen2Words: Automatic Mobile UI Summarization with Multimodal Learning

Aug 07, 2021

Bryan Wang, Gang Li, Xin Zhou, Zhourong Chen, Tovi Grossman, Yang Li

Figure 1 for Screen2Words: Automatic Mobile UI Summarization with Multimodal Learning

Figure 2 for Screen2Words: Automatic Mobile UI Summarization with Multimodal Learning

Figure 3 for Screen2Words: Automatic Mobile UI Summarization with Multimodal Learning

Figure 4 for Screen2Words: Automatic Mobile UI Summarization with Multimodal Learning

Share this with someone who'll enjoy it:

Abstract:Mobile User Interface Summarization generates succinct language descriptions of mobile screens for conveying important contents and functionalities of the screen, which can be useful for many language-based application scenarios. We present Screen2Words, a novel screen summarization approach that automatically encapsulates essential information of a UI screen into a coherent language phrase. Summarizing mobile screens requires a holistic understanding of the multi-modal data of mobile UIs, including text, image, structures as well as UI semantics, motivating our multi-modal learning approach. We collected and analyzed a large-scale screen summarization dataset annotated by human workers. Our dataset contains more than 112k language summarization across $\sim$22k unique UI screens. We then experimented with a set of deep models with different configurations. Our evaluation of these models with both automatic accuracy metrics and human rating shows that our approach can generate high-quality summaries for mobile screens. We demonstrate potential use cases of Screen2Words and open-source our dataset and model to lay the foundations for further bridging language and user interfaces.

* UIST'21

View paper on

Share this with someone who'll enjoy it:

Title:Screen2Words: Automatic Mobile UI Summarization with Multimodal Learning

Paper and Code