Abstract:Cancer is responsible for millions of deaths worldwide every year. Although significant progress has been achieved in cancer medicine, many issues remain to be addressed for improving cancer therapy. Appropriate cancer patient stratification is the prerequisite for selecting appropriate treatment plan, as cancer patients are of known heterogeneous genetic make-ups and phenotypic differences. In this study, built upon deep phenotypic characterizations extractable from Mayo Clinic electronic health records (EHRs) and genetic test reports for a collection of cancer patients, we developed a system leveraging a joint of phenotypic and genetic features for cancer patient subgrouping. The workflow is roughly divided into three parts: feature preprocessing, cancer patient classification, and cancer patient clustering based. In feature preprocessing step, we performed filtering, retaining the most relevant features. In cancer patient classification, we utilized joint categorical features to build a patient-feature matrix and applied nine different machine learning models, Random Forests (RF), Decision Tree (DT), Support Vector Machine (SVM), Naive Bayes (NB), Logistic Regression (LR), Multilayer Perceptron (MLP), Gradient Boosting (GB), Convolutional Neural Network (CNN), and Feedforward Neural Network (FNN), for classification purposes. Finally, in the cancer patient clustering step, we leveraged joint embeddings features and patient-feature associations to build an undirected feature graph and then trained the cancer feature node embeddings.
Abstract:Cancer is responsible for millions of deaths worldwide every year. Although significant progress hasbeen achieved in cancer medicine, many issues remain to be addressed for improving cancer therapy.Appropriate cancer patient stratification is the prerequisite for selecting appropriate treatment plan, ascancer patients are of known heterogeneous genetic make-ups and phenotypic differences. In thisstudy, built upon deep phenotypic characterizations extractable from Mayo Clinic electronic healthrecords (EHRs) and genetic test reports for a collection of cancer patients, we evaluated variousgraph neural networks (GNNs) leveraging a joint of phenotypic and genetic features for cancer typeclassification. Models were applied and fine-tuned on the Mayo Clinic cancer disease dataset. Theassessment was done through the reported accuracy, precision, recall, and F1 values as well as throughF1 scores based on the disease class. Per our evaluation results, GNNs on average outperformed thebaseline models with mean statistics always being higher that those of the baseline models (0.849 vs0.772 for accuracy, 0.858 vs 0.794 for precision, 0.843 vs 0.759 for recall, and 0.843 vs 0.855 for F1score). Among GNNs, ChebNet, GraphSAGE, and TAGCN showed the best performance, while GATshowed the worst. We applied and compared eight GNN models including AGNN, ChebNet, GAT,GCN, GIN, GraphSAGE, SGC, and TAGCN on the Mayo Clinic cancer disease dataset and assessedtheir performance as well as compared them with each other and with more conventional machinelearning models such as decision tree, gradient boosting, multi-layer perceptron, naive bayes, andrandom forest which we used as the baselines.
Abstract:Our study provided a review of the development of clinical concept extraction applications from January 2009 to June 2019. We hope, through the studying of different approaches with variant clinical context, can enhance the decision making for the development of clinical concept extraction.
Abstract:Biomedical ontology refers to a shared conceptualization for a biomedical domain of interest that has vastly improved data management and data sharing through the open data movement. The rapid growth and availability of biomedical data make it impractical and computationally expensive to perform manual analysis and query processing with the large scale ontologies. The lack of ability in analyzing ontologies from such a variety of sources, and supporting knowledge discovery for clinical practice and biomedical research should be overcome with new technologies. In this study, we developed a Medical Topic discovery and Query generation framework (MedTQ), which was composed by a series of approaches and algorithms. A predicate neighborhood pattern-based approach introduced has the ability to compute the similarity of predicates (relations) in ontologies. Given a predicate similarity metric, machine learning algorithms have been developed for automatic topic discovery and query generation. The topic discovery algorithm, called the hierarchical K-Means algorithm was designed by extending an existing supervised algorithm (K-means clustering) for the construction of a topic hierarchy. In the hierarchical K-Means algorithm, a level-by-level optimization strategy was selected for consistent with the strongly association between elements within a topic. Automatic query generation was facilitated for discovered topic that could be guided users for interactive query design and processing. Evaluation was conducted to generate topic hierarchy for DrugBank ontology as a case study. Results demonstrated that the MedTQ framework can enhance knowledge discovery by capturing underlying structures from domain specific data and ontologies.