Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:WebSRC: A Dataset for Web-Based Structural Reading Comprehension

Jan 23, 2021

Lu Chen, Xingyu Chen, Zihan Zhao, Danyang Zhang, Jiabao Ji, Ao Luo, Yuxuan Xiong, Kai Yu

Figure 1 for WebSRC: A Dataset for Web-Based Structural Reading Comprehension

Figure 2 for WebSRC: A Dataset for Web-Based Structural Reading Comprehension

Figure 3 for WebSRC: A Dataset for Web-Based Structural Reading Comprehension

Figure 4 for WebSRC: A Dataset for Web-Based Structural Reading Comprehension

Share this with someone who'll enjoy it:

Abstract:Web search is an essential way for human to obtain information, but it's still a great challenge for machines to understand the contents of web pages. In this paper, we introduce the task of web-based structural reading comprehension. Given a web page and a question about it, the task is to find an answer from the web page. This task requires a system not only to understand the semantics of texts but also the structure of the web page. Moreover, we proposed WebSRC, a novel Web-based Structural Reading Comprehension dataset. WebSRC consists of 0.44M question-answer pairs, which are collected from 6.5K web pages with corresponding HTML source code, screenshots, and metadata. Each question in WebSRC requires a certain structural understanding of a web page to answer, and the answer is either a text span on the web page or yes/no. We evaluate various strong baselines on our dataset to show the difficulty of our task. We also investigate the usefulness of structural information and visual features. Our dataset and task are publicly available at https://speechlab-sjtu.github.io/WebSRC/.

* 13 pages, 9 figures

View paper on

Share this with someone who'll enjoy it:

Title:WebSRC: A Dataset for Web-Based Structural Reading Comprehension

Paper and Code