Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Dimitris Pappas

Data Augmentation for Biomedical Factoid Question Answering

Apr 10, 2022

Dimitris Pappas, Prodromos Malakasiotis, Ion Androutsopoulos

Figure 1 for Data Augmentation for Biomedical Factoid Question Answering

Figure 2 for Data Augmentation for Biomedical Factoid Question Answering

Figure 3 for Data Augmentation for Biomedical Factoid Question Answering

Figure 4 for Data Augmentation for Biomedical Factoid Question Answering

Abstract:We study the effect of seven data augmentation (da) methods in factoid question answering, focusing on the biomedical domain, where obtaining training instances is particularly difficult. We experiment with data from the BioASQ challenge, which we augment with training instances obtained from an artificial biomedical machine reading comprehension dataset, or via back-translation, information retrieval, word substitution based on word2vec embeddings, or masked language modeling, question generation, or extending the given passage with additional context. We show that da can lead to very significant performance gains, even when using large pre-trained Transformers, contributing to a broader discussion of if/when da benefits large pre-trained models. One of the simplest da methods, word2vec-based word substitution, performed best and is recommended. We release our artificial training instances and code.

Via

Access Paper or Ask Questions

A Neural Model for Joint Document and Snippet Ranking in Question Answering for Large Document Collections

Jun 16, 2021

Dimitris Pappas, Ion Androutsopoulos

Figure 1 for A Neural Model for Joint Document and Snippet Ranking in Question Answering for Large Document Collections

Figure 2 for A Neural Model for Joint Document and Snippet Ranking in Question Answering for Large Document Collections

Figure 3 for A Neural Model for Joint Document and Snippet Ranking in Question Answering for Large Document Collections

Figure 4 for A Neural Model for Joint Document and Snippet Ranking in Question Answering for Large Document Collections

Abstract:Question answering (QA) systems for large document collections typically use pipelines that (i) retrieve possibly relevant documents, (ii) re-rank them, (iii) rank paragraphs or other snippets of the top-ranked documents, and (iv) select spans of the top-ranked snippets as exact answers. Pipelines are conceptually simple, but errors propagate from one component to the next, without later components being able to revise earlier decisions. We present an architecture for joint document and snippet ranking, the two middle stages, which leverages the intuition that relevant documents have good snippets and good snippets come from relevant documents. The architecture is general and can be used with any neural text relevance ranker. We experiment with two main instantiations of the architecture, based on POSIT-DRMM (PDRMM) and a BERT-based ranker. Experiments on biomedical data from BIOASQ show that our joint models vastly outperform the pipelines in snippet retrieval, the main goal for QA, with fewer trainable parameters, also remaining competitive in document retrieval. Furthermore, our joint PDRMM-based model is competitive with BERT-based models, despite using orders of magnitude fewer parameters. These claims are also supported by human evaluation on two test batches of BIOASQ. To test our key findings on another dataset, we modified the Natural Questions dataset so that it can also be used for document and snippet retrieval. Our joint PDRMM-based model again outperforms the corresponding pipeline in snippet retrieval on the modified Natural Questions dataset, even though it performs worse than the pipeline in document retrieval. We make our code and the modified Natural Questions dataset publicly available.

* 12 pages, 3 figures, 4 tables, ACL-IJCNLP 2021

Via

Access Paper or Ask Questions

BIOMRC: A Dataset for Biomedical Machine Reading Comprehension

May 13, 2020

Petros Stavropoulos, Dimitris Pappas, Ion Androutsopoulos, Ryan McDonald

Figure 1 for BIOMRC: A Dataset for Biomedical Machine Reading Comprehension

Figure 2 for BIOMRC: A Dataset for Biomedical Machine Reading Comprehension

Figure 3 for BIOMRC: A Dataset for Biomedical Machine Reading Comprehension

Figure 4 for BIOMRC: A Dataset for Biomedical Machine Reading Comprehension

Abstract:We introduce BIOMRC, a large-scale cloze-style biomedical MRC dataset. Care was taken to reduce noise, compared to the previous BIOREAD dataset of Pappas et al. (2018). Experiments show that simple heuristics do not perform well on the new dataset, and that two neural MRC models that had been tested on BIOREAD perform much better on BIOMRC, indicating that the new dataset is indeed less noisy or at least that its task is more feasible. Non-expert human performance is also higher on the new dataset compared to BIOREAD, and biomedical experts perform even better. We also introduce a new BERT-based MRC model, the best version of which substantially outperforms all other methods tested, reaching or surpassing the accuracy of biomedical experts in some experiments. We make the new dataset available in three different sizes, also releasing our code, and providing a leaderboard.

* 10 pages, 4 figures, 5 tables

Via

Access Paper or Ask Questions

Embedding Biomedical Ontologies by Jointly Encoding Network Structure and Textual Node Descriptors

Jun 20, 2019

Sotiris Kotitsas, Dimitris Pappas, Ion Androutsopoulos, Ryan McDonald, Marianna Apidianaki

Figure 1 for Embedding Biomedical Ontologies by Jointly Encoding Network Structure and Textual Node Descriptors

Figure 2 for Embedding Biomedical Ontologies by Jointly Encoding Network Structure and Textual Node Descriptors

Figure 3 for Embedding Biomedical Ontologies by Jointly Encoding Network Structure and Textual Node Descriptors

Figure 4 for Embedding Biomedical Ontologies by Jointly Encoding Network Structure and Textual Node Descriptors

Abstract:Network Embedding (NE) methods, which map network nodes to low-dimensional feature vectors, have wide applications in network analysis and bioinformatics. Many existing NE methods rely only on network structure, overlooking other information associated with the nodes, e.g., text describing the nodes. Recent attempts to combine the two sources of information only consider local network structure. We extend NODE2VEC, a well-known NE method that considers broader network structure, to also consider textual node descriptors using recurrent neural encoders. Our method is evaluated on link prediction in two networks derived from UMLS. Experimental results demonstrate the effectiveness of the proposed approach compared to previous work.

* Proceedings of the 18th Workshop on Biomedical Natural Language Processing (BioNLP 2019) of the 57th Annual Meeting of the Association for Computational Linguistics (ACL 2019), Florence, Italy, 2019

Via

Access Paper or Ask Questions

AUEB at BioASQ 6: Document and Snippet Retrieval

Sep 15, 2018

Georgios-Ioannis Brokos, Polyvios Liosis, Ryan McDonald, Dimitris Pappas, Ion Androutsopoulos

Figure 1 for AUEB at BioASQ 6: Document and Snippet Retrieval

Figure 2 for AUEB at BioASQ 6: Document and Snippet Retrieval

Figure 3 for AUEB at BioASQ 6: Document and Snippet Retrieval

Figure 4 for AUEB at BioASQ 6: Document and Snippet Retrieval

Abstract:We present AUEB's submissions to the BioASQ 6 document and snippet retrieval tasks (parts of Task 6b, Phase A). Our models use novel extensions to deep learning architectures that operate solely over the text of the query and candidate document/snippets. Our systems scored at the top or near the top for all batches of the challenge, highlighting the effectiveness of deep learning for these tasks.

* In Proceedings of the workshop BioASQ: Large-scale Biomedical Semantic Indexing and Question Answering, at the Conference on Empirical Methods in Natural Language Processing (EMNLP 2018), Brussels, Belgium, 2018. arXiv admin note: text overlap with arXiv:1809.01682

Via

Access Paper or Ask Questions