Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Nandan Sarkar

SciDQA: A Deep Reading Comprehension Dataset over Scientific Papers

Nov 08, 2024

Shruti Singh, Nandan Sarkar, Arman Cohan

Figure 1 for SciDQA: A Deep Reading Comprehension Dataset over Scientific Papers

Figure 2 for SciDQA: A Deep Reading Comprehension Dataset over Scientific Papers

Figure 3 for SciDQA: A Deep Reading Comprehension Dataset over Scientific Papers

Figure 4 for SciDQA: A Deep Reading Comprehension Dataset over Scientific Papers

Abstract:Scientific literature is typically dense, requiring significant background knowledge and deep comprehension for effective engagement. We introduce SciDQA, a new dataset for reading comprehension that challenges LLMs for a deep understanding of scientific articles, consisting of 2,937 QA pairs. Unlike other scientific QA datasets, SciDQA sources questions from peer reviews by domain experts and answers by paper authors, ensuring a thorough examination of the literature. We enhance the dataset's quality through a process that carefully filters out lower quality questions, decontextualizes the content, tracks the source document across different versions, and incorporates a bibliography for multi-document question-answering. Questions in SciDQA necessitate reasoning across figures, tables, equations, appendices, and supplementary materials, and require multi-document reasoning. We evaluate several open-source and proprietary LLMs across various configurations to explore their capabilities in generating relevant and factual responses. Our comprehensive evaluation, based on metrics for surface-level similarity and LLM judgements, highlights notable performance discrepancies. SciDQA represents a rigorously curated, naturally derived scientific QA dataset, designed to facilitate research on complex scientific text understanding.

* 18 pages, Accepted to EMNLP 2024

Via

Access Paper or Ask Questions