Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Who did What: A Large-Scale Person-Centered Cloze Dataset

Aug 19, 2016

Takeshi Onishi, Hai Wang, Mohit Bansal, Kevin Gimpel, David McAllester

Figure 1 for Who did What: A Large-Scale Person-Centered Cloze Dataset

Figure 2 for Who did What: A Large-Scale Person-Centered Cloze Dataset

Figure 3 for Who did What: A Large-Scale Person-Centered Cloze Dataset

Share this with someone who'll enjoy it:

Abstract:We have constructed a new "Who-did-What" dataset of over 200,000 fill-in-the-gap (cloze) multiple choice reading comprehension problems constructed from the LDC English Gigaword newswire corpus. The WDW dataset has a variety of novel features. First, in contrast with the CNN and Daily Mail datasets (Hermann et al., 2015) we avoid using article summaries for question formation. Instead, each problem is formed from two independent articles --- an article given as the passage to be read and a separate article on the same events used to form the question. Second, we avoid anonymization --- each choice is a person named entity. Third, the problems have been filtered to remove a fraction that are easily solved by simple baselines, while remaining 84% solvable by humans. We report performance benchmarks of standard systems and propose the WDW dataset as a challenge task for the community.

* To appear at EMNLP 2016. Our dataset is available at tticnlp.github.io/who_did_what

View paper on

Share this with someone who'll enjoy it:

Title:Who did What: A Large-Scale Person-Centered Cloze Dataset

Paper and Code