Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Anh-Khoa Duong Nguyen

Analyzing the Effectiveness of the Underlying Reasoning Tasks in Multi-hop Question Answering

Feb 12, 2023

Xanh Ho, Anh-Khoa Duong Nguyen, Saku Sugawara, Akiko Aizawa

Figure 1 for Analyzing the Effectiveness of the Underlying Reasoning Tasks in Multi-hop Question Answering

Figure 2 for Analyzing the Effectiveness of the Underlying Reasoning Tasks in Multi-hop Question Answering

Figure 3 for Analyzing the Effectiveness of the Underlying Reasoning Tasks in Multi-hop Question Answering

Figure 4 for Analyzing the Effectiveness of the Underlying Reasoning Tasks in Multi-hop Question Answering

Abstract:To explain the predicted answers and evaluate the reasoning abilities of models, several studies have utilized underlying reasoning (UR) tasks in multi-hop question answering (QA) datasets. However, it remains an open question as to how effective UR tasks are for the QA task when training models on both tasks in an end-to-end manner. In this study, we address this question by analyzing the effectiveness of UR tasks (including both sentence-level and entity-level tasks) in three aspects: (1) QA performance, (2) reasoning shortcuts, and (3) robustness. While the previous models have not been explicitly trained on an entity-level reasoning prediction task, we build a multi-task model that performs three tasks together: sentence-level supporting facts prediction, entity-level reasoning prediction, and answer prediction. Experimental results on 2WikiMultiHopQA and HotpotQA-small datasets reveal that (1) UR tasks can improve QA performance. Using four debiased datasets that are newly created, we demonstrate that (2) UR tasks are helpful in preventing reasoning shortcuts in the multi-hop QA task. However, we find that (3) UR tasks do not contribute to improving the robustness of the model on adversarial questions, such as sub-questions and inverted questions. We encourage future studies to investigate the effectiveness of entity-level reasoning in the form of natural language questions (e.g., sub-question forms).

* Accepted by EACL 2023 (Findings)

Via

Access Paper or Ask Questions

Constructing A Multi-hop QA Dataset for Comprehensive Evaluation of Reasoning Steps

Nov 12, 2020

Xanh Ho, Anh-Khoa Duong Nguyen, Saku Sugawara, Akiko Aizawa

Figure 1 for Constructing A Multi-hop QA Dataset for Comprehensive Evaluation of Reasoning Steps

Figure 2 for Constructing A Multi-hop QA Dataset for Comprehensive Evaluation of Reasoning Steps

Figure 3 for Constructing A Multi-hop QA Dataset for Comprehensive Evaluation of Reasoning Steps

Figure 4 for Constructing A Multi-hop QA Dataset for Comprehensive Evaluation of Reasoning Steps

Abstract:A multi-hop question answering (QA) dataset aims to test reasoning and inference skills by requiring a model to read multiple paragraphs to answer a given question. However, current datasets do not provide a complete explanation for the reasoning process from the question to the answer. Further, previous studies revealed that many examples in existing multi-hop datasets do not require multi-hop reasoning to answer a question. In this study, we present a new multi-hop QA dataset, called 2WikiMultiHopQA, which uses structured and unstructured data. In our dataset, we introduce the evidence information containing a reasoning path for multi-hop questions. The evidence information has two benefits: (i) providing a comprehensive explanation for predictions and (ii) evaluating the reasoning skills of a model. We carefully design a pipeline and a set of templates when generating a question-answer pair that guarantees the multi-hop steps and the quality of the questions. We also exploit the structured format in Wikidata and use logical rules to create questions that are natural but still require multi-hop reasoning. Through experiments, we demonstrate that our dataset is challenging for multi-hop models and it ensures that multi-hop reasoning is required.

* Accepted by COLING 2020

Via

Access Paper or Ask Questions