Abstract:Negation and uncertainty modeling are long-standing tasks in natural language processing. Linguistic theory postulates that expressions of negation and uncertainty are semantically independent from each other and the content they modify. However, previous works on representation learning do not explicitly model this independence. We therefore attempt to disentangle the representations of negation, uncertainty, and content using a Variational Autoencoder. We find that simply supervising the latent representations results in good disentanglement, but auxiliary objectives based on adversarial learning and mutual information minimization can provide additional disentanglement gains.
Abstract:Our objective in this study is to investigate the behavior of Boolean operators on combining annotation output from multiple Natural Language Processing (NLP) systems across multiple corpora and to assess how filtering by aggregation of Unified Medical Language System (UMLS) Metathesaurus concepts affects system performance for Named Entity Recognition (NER) of UMLS concepts. We used three corpora annotated for UMLS concepts: 2010 i2b2 VA challenge set (31,161 annotations), Multi-source Integrated Platform for Answering Clinical Questions (MiPACQ) corpus (17,457 annotations including UMLS concept unique identifiers), and Fairview Health Services corpus (44,530 annotations). Our results showed that for UMLS concept matching, Boolean ensembling of the MiPACQ corpus trended towards higher performance over individual systems. Use of an approximate grid-search can help optimize the precision-recall tradeoff and can provide a set of heuristics for choosing an optimal set of ensembles.
Abstract:OBJECTIVE: Leverage existing biomedical NLP tools and DS domain terminology to produce a novel and comprehensive knowledge graph containing dietary supplement (DS) information for discovering interactions between DS and drugs, or Drug-Supplement Interactions (DSI). MATERIALS AND METHODS: We created SemRepDS (an extension of SemRep), capable of extracting semantic relations from abstracts by leveraging a DS-specific terminology (iDISK) containing 28,884 DS terms not found in the UMLS. PubMed abstracts were processed using SemRepDS to generate semantic relations, which were then filtered using a PubMedBERT-based model to remove incorrect relations before generating our knowledge graph (SuppKG). Two pathways are used to identify potential DS-Drug interactions which are then evaluated by medical professionals for mechanistic plausibility. RESULTS: Comparison analysis found that SemRepDS returned 206.9% more DS relations and 158.5% more DS entities than SemRep. The fine-tuned BERT model obtained an F1 score of 0.8605 and removed 43.86% of the relations, improving the precision of the relations by 26.4% compared to pre-filtering. SuppKG consists of 2,928 DS-specific nodes. Manual review of findings identified 44 (88%) proposed DS-Gene-Drug and 32 (64%) proposed DS-Gene1-Function-Gene2-Drug pathways to be mechanistically plausible. DISCUSSION: The additional relations extracted using SemRepDS generated SuppKG that was used to find plausible DSI not found in the current literature. By the nature of the SuppKG, these interactions are unlikely to have been found using SemRep without the expanded DS terminology. CONCLUSION: We successfully extend SemRep to include DS information and produce SuppKG which can be used to find potential DS-Drug interactions.