UC San Diego
Abstract:Large Language Models (LLMs) have demonstrated impressive capabilities in role-playing scenarios, particularly in simulating domain-specific experts using tailored prompts. This ability enables LLMs to adopt the persona of individuals with specific backgrounds, offering a cost-effective and efficient alternative to traditional, resource-intensive user studies. By mimicking human behavior, LLMs can anticipate responses based on concrete demographic or professional profiles. In this paper, we evaluate the effectiveness of LLMs in simulating individuals with diverse backgrounds and analyze the consistency of these simulated behaviors compared to real-world outcomes. In particular, we explore the potential of LLMs to interpret and respond to discharge summaries provided to patients leaving the Intensive Care Unit (ICU). We evaluate and compare with human responses the comprehensibility of discharge summaries among individuals with varying educational backgrounds, using this analysis to assess the strengths and limitations of LLM-driven simulations. Notably, when LLMs are primed with educational background information, they deliver accurate and actionable medical guidance 88% of the time. However, when other information is provided, performance significantly drops, falling below random chance levels. This preliminary study shows the potential benefits and pitfalls of automatically generating patient-specific health information from diverse populations. While LLMs show promise in simulating health personas, our results highlight critical gaps that must be addressed before they can be reliably used in clinical settings. Our findings suggest that a straightforward query-response model could outperform a more tailored approach in delivering health information. This is a crucial first step in understanding how LLMs can be optimized for personalized health communication while maintaining accuracy.
Abstract:Objective. Annotation is expensive but essential for clinical note review and clinical natural language processing (cNLP). However, the extent to which computer-generated pre-annotation is beneficial to human annotation is still an open question. Our study introduces CLEAN (CLinical note rEview and ANnotation), a pre-annotation-based cNLP annotation system to improve clinical note annotation of data elements, and comprehensively compares CLEAN with the widely-used annotation system Brat Rapid Annotation Tool (BRAT). Materials and Methods. CLEAN includes an ensemble pipeline (CLEAN-EP) with a newly developed annotation tool (CLEAN-AT). A domain expert and a novice user/annotator participated in a comparative usability test by tagging 87 data elements related to Congestive Heart Failure (CHF) and Kawasaki Disease (KD) cohorts in 84 public notes. Results. CLEAN achieved higher note-level F1-score (0.896) over BRAT (0.820), with significant difference in correctness (P-value < 0.001), and the mostly related factor being system/software (P-value < 0.001). No significant difference (P-value 0.188) in annotation time was observed between CLEAN (7.262 minutes/note) and BRAT (8.286 minutes/note). The difference was mostly associated with note length (P-value < 0.001) and system/software (P-value 0.013). The expert reported CLEAN to be useful/satisfactory, while the novice reported slight improvements. Discussion. CLEAN improves the correctness of annotation and increases usefulness/satisfaction with the same level of efficiency. Limitations include untested impact of pre-annotation correctness rate, small sample size, small user size, and restrictedly validated gold standard. Conclusion. CLEAN with pre-annotation can be beneficial for an expert to deal with complex annotation tasks involving numerous and diverse target data elements.
Abstract:In modern electronic medical records (EMR) much of the clinically important data - signs and symptoms, symptom severity, disease status, etc. - are not provided in structured data fields, but rather are encoded in clinician generated narrative text. Natural language processing (NLP) provides a means of "unlocking" this important data source for applications in clinical decision support, quality assurance, and public health. This chapter provides an overview of representative NLP systems in biomedicine based on a unified architectural view. A general architecture in an NLP system consists of two main components: background knowledge that includes biomedical knowledge resources and a framework that integrates NLP tools to process text. Systems differ in both components, which we will review briefly. Additionally, challenges facing current research efforts in biomedical NLP include the paucity of large, publicly available annotated corpora, although initiatives that facilitate data sharing, system evaluation, and collaborative work between researchers in clinical NLP are starting to emerge.
Abstract:Systems that exploit publicly available user generated content such as Twitter messages have been successful in tracking seasonal influenza. We developed a novel filtering method for Influenza-Like-Illnesses (ILI)-related messages using 587 million messages from Twitter micro-blogs. We first filtered messages based on syndrome keywords from the BioCaster Ontology, an extant knowledge model of laymen's terms. We then filtered the messages according to semantic features such as negation, hashtags, emoticons, humor and geography. The data covered 36 weeks for the US 2009 influenza season from 30th August 2009 to 8th May 2010. Results showed that our system achieved the highest Pearson correlation coefficient of 98.46% (p-value<2.2e-16), an improvement of 3.98% over the previous state-of-the-art method. The results indicate that simple NLP-based enhancements to existing approaches to mine Twitter data can increase the value of this inexpensive resource.
Abstract:In many real-world applications of machine learning classifiers, it is essential to predict the probability of an example belonging to a particular class. This paper proposes a simple technique for predicting probabilities based on optimizing a ranking loss, followed by isotonic regression. This semi-parametric technique offers both good ranking and regression performance, and models a richer set of probability distributions than statistical workhorses such as logistic regression. We provide experimental results that show the effectiveness of this technique on real-world applications of probability prediction.