Abstract:Research publications are the primary vehicle for sharing scientific progress in the form of new discoveries, methods, techniques, and insights. Publications have been studied from the perspectives of both content analysis and bibliometric structure, but a barrier to more comprehensive studies of scientific research is a lack of publicly accessible large-scale data and resources. In this paper, we present PubGraph, a new resource for studying scientific progress that takes the form of a large-scale temporal knowledge graph (KG). It contains more than 432M nodes and 15.49B edges mapped to the popular Wikidata ontology. We extract three KGs with varying sizes from PubGraph to allow experimentation at different scales. Using these KGs, we introduce a new link prediction benchmark for transductive and inductive settings with temporally-aligned training, validation, and testing partitions. Moreover, we develop two new inductive learning methods better suited to PubGraph, operating on unseen nodes without explicit features, scaling to large KGs, and outperforming existing models. Our results demonstrate that structural features of past citations are sufficient to produce high-quality predictions about new publications. We also identify new challenges for KG models, including an adversarial community-based link prediction setting, zero-shot inductive learning, and large-scale learning.
Abstract:Parkinson's disease (PD) is a degenerative and progressive neurological condition. Early diagnosis can improve treatment for patients and is performed through dopaminergic imaging techniques like the SPECT DaTscan. In this study, we propose a machine learning model that accurately classifies any given DaTscan as having Parkinson's disease or not, in addition to providing a plausible reason for the prediction. This is kind of reasoning is done through the use of visual indicators generated using Local Interpretable Model-Agnostic Explainer (LIME) methods. DaTscans were drawn from the Parkinson's Progression Markers Initiative database and trained on a CNN (VGG16) using transfer learning, yielding an accuracy of 95.2%, a sensitivity of 97.5%, and a specificity of 90.9%. Keeping model interpretability of paramount importance, especially in the healthcare field, this study utilises LIME explanations to distinguish PD from non-PD, using visual superpixels on the DaTscans. It could be concluded that the proposed system, in union with its measured interpretability and accuracy may effectively aid medical workers in the early diagnosis of Parkinson's Disease.