Abstract:In this paper, we propose a method for incorporating world knowledge (linked entities and fine-grained entity types) into a neural question generation model. This world knowledge helps to encode additional information related to the entities present in the passage required to generate human-like questions. We evaluate our models on both SQuAD and MS MARCO to demonstrate the usefulness of the world knowledge features. The proposed world knowledge enriched question generation model is able to outperform the vanilla neural question generation model by 1.37 and 1.59 absolute BLEU 4 score on SQuAD and MS MARCO test dataset respectively.
Abstract:This paper presents the Frames dataset (Frames is available at http://datasets.maluuba.com/Frames), a corpus of 1369 human-human dialogues with an average of 15 turns per dialogue. We developed this dataset to study the role of memory in goal-oriented dialogue systems. Based on Frames, we introduce a task called frame tracking, which extends state tracking to a setting where several states are tracked simultaneously. We propose a baseline model for this task. We show that Frames can also be used to study memory in dialogue management and information presentation through natural language generation.
Abstract:We present NewsQA, a challenging machine comprehension dataset of over 100,000 human-generated question-answer pairs. Crowdworkers supply questions and answers based on a set of over 10,000 news articles from CNN, with answers consisting of spans of text from the corresponding articles. We collect this dataset through a four-stage process designed to solicit exploratory questions that require reasoning. A thorough analysis confirms that NewsQA demands abilities beyond simple word matching and recognizing textual entailment. We measure human performance on the dataset and compare it to several strong neural models. The performance gap between humans and machines (0.198 in F1) indicates that significant progress can be made on NewsQA through future research. The dataset is freely available at https://datasets.maluuba.com/NewsQA.