Abstract:Identifying a patient's key problems over time is a common task for providers at the point care, yet a complex and time-consuming activity given current electric health records. To enable a problem-oriented summarizer to identify a patient's comprehensive list of problems and their salience, we propose an unsupervised phenotyping approach that jointly learns a large number of phenotypes/problems across structured and unstructured data. To identify the appropriate granularity of the learned phenotypes, the model is trained on a target patient population of the same clinic. To enable the content organization of a problem-oriented summarizer, the model identifies phenotype relatedness as well. The model leverages a correlated-mixed membership approach with variational inference applied to heterogenous clinical data. In this paper, we focus our experiments on assessing the learned phenotypes and their relatedness as learned from a specific patient population. We ground our experiments in phenotyping patients from an HIV clinic in a large urban care institution (n=7,523), where patients have voluminous, longitudinal documentation, and where providers would benefit from summaries of these patient's medical histories, whether about their HIV or any comorbidities. We find that the learned phenotypes and their relatedness are clinically valid when assessed qualitatively by clinical experts, and that the model surpasses baseline in inferring phenotype-relatedness when comparing to existing expert-curated condition groupings.