Abstract:Gene expression profiles obtained through DNA microarray have proven successful in providing critical information for cancer detection classifiers. However, the limited number of samples in these datasets poses a challenge to employ complex methodologies such as deep neural networks for sophisticated analysis. To address this "small data" dilemma, Meta-Learning has been introduced as a solution to enhance the optimization of machine learning models by utilizing similar datasets, thereby facilitating a quicker adaptation to target datasets without the requirement of sufficient samples. In this study, we present a meta-learning-based approach for predicting lung cancer from gene expression profiles. We apply this framework to well-established deep learning methodologies and employ four distinct datasets for the meta-learning tasks, where one as the target dataset and the rest as source datasets. Our approach is evaluated against both traditional and deep learning methodologies, and the results show the superior performance of meta-learning on augmented source data compared to the baselines trained on single datasets. Moreover, we conduct the comparative analysis between meta-learning and transfer learning methodologies to highlight the efficiency of the proposed approach in addressing the challenges associated with limited sample sizes. Finally, we incorporate the explainability study to illustrate the distinctiveness of decisions made by meta-learning.
Abstract:Recent advancements in sequential modeling applied to Electronic Health Records (EHR) have greatly influenced prescription recommender systems. While the recent literature on drug recommendation has shown promising performance, the study of discovering a diversity of coexisting temporal relationships at the level of medical codes over consecutive visits remains less explored. The goal of this study can be motivated from two perspectives. First, there is a need to develop a sophisticated sequential model capable of disentangling the complex relationships across sequential visits. Second, it is crucial to establish multiple and diverse health profiles for the same patient to ensure a comprehensive consideration of different medical intents in drug recommendation. To achieve this goal, we introduce Attentive Recommendation with Contrasted Intents (ARCI), a multi-level transformer-based method designed to capture the different but coexisting temporal paths across a shared sequence of visits. Specifically, we propose a novel intent-aware method with contrastive learning, that links specialized medical intents of the patients to the transformer heads for extracting distinct temporal paths associated with different health profiles. We conducted experiments on two real-world datasets for the prescription recommendation task using both ranking and classification metrics. Our results demonstrate that ARCI has outperformed the state-of-the-art prescription recommendation methods and is capable of providing interpretable insights for healthcare practitioners.
Abstract:Survival analysis plays a crucial role in many healthcare decisions, where the risk prediction for the events of interest can support an informative outlook for a patient's medical journey. Given the existence of data censoring, an effective way of survival analysis is to enforce the pairwise temporal concordance between censored and observed data, aiming to utilize the time interval before censoring as partially observed time-to-event labels for supervised learning. Although existing studies mostly employed ranking methods to pursue an ordering objective, contrastive methods which learn a discriminative embedding by having data contrast against each other, have not been explored thoroughly for survival analysis. Therefore, in this paper, we propose a novel Ontology-aware Temporality-based Contrastive Survival (OTCSurv) analysis framework that utilizes survival durations from both censored and observed data to define temporal distinctiveness and construct negative sample pairs with adjustable hardness for contrastive learning. Specifically, we first use an ontological encoder and a sequential self-attention encoder to represent the longitudinal EHR data with rich contexts. Second, we design a temporal contrastive loss to capture varying survival durations in a supervised setting through a hardness-aware negative sampling mechanism. Last, we incorporate the contrastive task into the time-to-event predictive task with multiple loss components. We conduct extensive experiments using a large EHR dataset to forecast the risk of hospitalized patients who are in danger of developing acute kidney injury (AKI), a critical and urgent medical condition. The effectiveness and explainability of the proposed model are validated through comprehensive quantitative and qualitative studies.
Abstract:The advancement of social media contributes to the growing amount of content they share frequently. This framework provides a sophisticated place for people to report various real-life events. Detecting these events with the help of natural language processing has received researchers' attention, and various algorithms have been developed for this goal. In this paper, we propose a Semantic Modular Model (SMM) consisting of 5 different modules, namely Distributional Denoising Autoencoder, Incremental Clustering, Semantic Denoising, Defragmentation, and Ranking and Processing. The proposed model aims to (1) cluster various documents and ignore the documents that might not contribute to the identification of events, (2) identify more important and descriptive keywords. Compared to the state-of-the-art methods, the results show that the proposed model has a higher performance in identifying events with lower ranks and extracting keywords for more important events in three English Twitter datasets: FACup, SuperTuesday, and USElection. The proposed method outperformed the best reported results in the mean keyword-precision metric by 7.9\%.