Abstract:Background: Electronic health records (EHRs) are a data source for opioid research. Opioid use disorder is known to be under-coded as a diagnosis, yet problematic opioid use can be documented in clinical notes. Objectives: Our goals were 1) to identify problematic opioid use from a full range of clinical notes; and 2) to compare the characteristics of patients identified as having problematic opioid use, exclusively documented in clinical notes, to those having documented ICD opioid use disorder diagnostic codes. Materials and Methods: We developed and applied a natural language processing (NLP) tool to the clinical notes of a patient cohort (n=222,371) from two Veteran Affairs service regions to identify patients with problematic opioid use. We also used a set of ICD diagnostic codes to identify patients with opioid use disorder from the same cohort. We compared the demographic and clinical characteristics of patients identified only through NLP, to those of patients identified through ICD codes. Results: NLP exclusively identified 57,331 patients; 6,997 patients had positive ICD code identifications. Patients exclusively identified through NLP were more likely to be women. Those identified through ICD codes were more likely to be male, younger, have concurrent benzodiazepine prescriptions, more comorbidities, more care encounters, and less likely to be married. Patients in the NLP and ICD groups had substantially elevated comorbidity levels compared to patients not documented as experiencing problematic opioid use. Conclusions: NLP is a feasible approach for identifying problematic opioid use not otherwise recorded by ICD codes. Clinicians may be reluctant to code for opioid use disorder. It is therefore incumbent on the healthcare team to search for documentation of opioid concerns within clinical notes.
Abstract:Identifying suicidality including suicidal ideation, attempts, and risk factors in electronic health record data in clinical notes is difficult. A major difficulty is the lack of training samples given the small number of true positive instances among the increasingly large number of patients being screened. This paper describes a novel methodology that identifies suicidality in clinical notes by addressing this data sparsity issue through zero-shot learning. U.S. Veterans Affairs clinical notes served as data. The training dataset label was determined using diagnostic codes of suicide attempt and self-harm. A base string associated with the target label of suicidality was used to provide auxiliary information by narrowing the positive training cases to those containing the base string. A deep neural network was trained by mapping the training documents contents to a semantic space. For comparison, we trained another deep neural network using the identical training dataset labels and bag-of-words features. The zero shot learning model outperformed the baseline model in terms of AUC, sensitivity, specificity, and positive predictive value at multiple probability thresholds. In applying a 0.90 probability threshold, the methodology identified notes not associated with a relevant ICD 10 CM code that documented suicidality, with 94 percent accuracy. This new method can effectively identify suicidality without requiring manual annotation.