Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Junu Kim

Enhancing LLMs' Clinical Reasoning with Real-World Data from a Nationwide Sepsis Registry

May 05, 2025

Junu Kim, Chaeeun Shim, Sungjin Park, Su Yeon Lee, Gee Young Suh, Chae-Man Lim, Seong Jin Choi, Song Mi Moon, Kyoung-Ho Song, Eu Suk Kim(+9 more)

Abstract:Although large language models (LLMs) have demonstrated impressive reasoning capabilities across general domains, their effectiveness in real-world clinical practice remains limited. This is likely due to their insufficient exposure to real-world clinical data during training, as such data is typically not included due to privacy concerns. To address this, we propose enhancing the clinical reasoning capabilities of LLMs by leveraging real-world clinical data. We constructed reasoning-intensive questions from a nationwide sepsis registry and fine-tuned Phi-4 on these questions using reinforcement learning, resulting in C-Reason. C-Reason exhibited strong clinical reasoning capabilities on the in-domain test set, as evidenced by both quantitative metrics and expert evaluations. Furthermore, its enhanced reasoning capabilities generalized to a sepsis dataset involving different tasks and patient cohorts, an open-ended consultations on antibiotics use task, and other diseases. Future research should focus on training LLMs with large-scale, multi-disease clinical datasets to develop more powerful, general-purpose clinical reasoning models.

Via

Access Paper or Ask Questions

Towards Predicting Temporal Changes in a Patient's Chest X-ray Images based on Electronic Health Records

Sep 11, 2024

Daeun Kyung, Junu Kim, Tackeun Kim, Edward Choi

Figure 1 for Towards Predicting Temporal Changes in a Patient's Chest X-ray Images based on Electronic Health Records

Figure 2 for Towards Predicting Temporal Changes in a Patient's Chest X-ray Images based on Electronic Health Records

Figure 3 for Towards Predicting Temporal Changes in a Patient's Chest X-ray Images based on Electronic Health Records

Figure 4 for Towards Predicting Temporal Changes in a Patient's Chest X-ray Images based on Electronic Health Records

Abstract:Chest X-ray imaging (CXR) is an important diagnostic tool used in hospitals to assess patient conditions and monitor changes over time. Generative models, specifically diffusion-based models, have shown promise in generating realistic synthetic X-rays. However, these models mainly focus on conditional generation using single-time-point data, i.e., typically CXRs taken at a specific time with their corresponding reports, limiting their clinical utility, particularly for capturing temporal changes. To address this limitation, we propose a novel framework, EHRXDiff, which predicts future CXR images by integrating previous CXRs with subsequent medical events, e.g., prescriptions, lab measures, etc. Our framework dynamically tracks and predicts disease progression based on a latent diffusion model, conditioned on the previous CXR image and a history of medical events. We comprehensively evaluate the performance of our framework across three key aspects, including clinical consistency, demographic consistency, and visual realism. We demonstrate that our framework generates high-quality, realistic future images that capture potential temporal changes, suggesting its potential for further development as a clinical simulation tool. This could offer valuable insights for patient monitoring and treatment planning in the medical field.

Via

Access Paper or Ask Questions

EHRFL: Federated Learning Framework for Heterogeneous EHRs and Precision-guided Selection of Participating Clients

Apr 20, 2024

Jiyoun Kim, Junu Kim, Kyunghoon Hur, Edward Choi

Figure 1 for EHRFL: Federated Learning Framework for Heterogeneous EHRs and Precision-guided Selection of Participating Clients

Figure 2 for EHRFL: Federated Learning Framework for Heterogeneous EHRs and Precision-guided Selection of Participating Clients

Figure 3 for EHRFL: Federated Learning Framework for Heterogeneous EHRs and Precision-guided Selection of Participating Clients

Figure 4 for EHRFL: Federated Learning Framework for Heterogeneous EHRs and Precision-guided Selection of Participating Clients

Abstract:In this study, we provide solutions to two practical yet overlooked scenarios in federated learning for electronic health records (EHRs): firstly, we introduce EHRFL, a framework that facilitates federated learning across healthcare institutions with distinct medical coding systems and database schemas using text-based linearization of EHRs. Secondly, we focus on a scenario where a single healthcare institution initiates federated learning to build a model tailored for itself, in which the number of clients must be optimized in order to reduce expenses incurred by the host. For selecting participating clients, we present a novel precision-based method, leveraging data latents to identify suitable participants for the institution. Our empirical results show that EHRFL effectively enables federated learning across hospitals with different EHR systems. Furthermore, our results demonstrate the efficacy of our precision-based method in selecting reduced number of participating clients without compromising model performance, resulting in lower operational costs when constructing institution-specific models. We believe this work lays a foundation for the broader adoption of federated learning on EHRs.

Via

Access Paper or Ask Questions

General-Purpose Retrieval-Enhanced Medical Prediction Model Using Near-Infinite History

Oct 31, 2023

Junu Kim, Chaeeun Shim, Bosco Seong Kyu Yang, Chami Im, Sung Yoon Lim, Han-Gil Jeong, Edward Choi

Figure 1 for General-Purpose Retrieval-Enhanced Medical Prediction Model Using Near-Infinite History

Figure 2 for General-Purpose Retrieval-Enhanced Medical Prediction Model Using Near-Infinite History

Figure 3 for General-Purpose Retrieval-Enhanced Medical Prediction Model Using Near-Infinite History

Figure 4 for General-Purpose Retrieval-Enhanced Medical Prediction Model Using Near-Infinite History

Abstract:Developing clinical prediction models (e.g., mortality prediction) based on electronic health records (EHRs) typically relies on expert opinion for feature selection and adjusting observation window size. This burdens experts and creates a bottleneck in the development process. We propose Retrieval-Enhanced Medical prediction model (REMed) to address such challenges. REMed can essentially evaluate an unlimited number of clinical events, select the relevant ones, and make predictions. This approach effectively eliminates the need for manual feature selection and enables an unrestricted observation window. We verified these properties through experiments on 27 clinical tasks and two independent cohorts from publicly available EHR datasets, where REMed outperformed other contemporary architectures that aim to handle as many events as possible. Notably, we found that the preferences of REMed align closely with those of medical experts. We expect our approach to significantly expedite the development of EHR prediction models by minimizing clinicians' need for manual involvement.

* The source codes corresponding to this paper are available at: https://github.com/starmpcc/REMed

Via

Access Paper or Ask Questions

Publicly Shareable Clinical Large Language Model Built on Synthetic Clinical Notes

Sep 06, 2023

Sunjun Kweon, Junu Kim, Jiyoun Kim, Sujeong Im, Eunbyeol Cho, Seongsu Bae, Jungwoo Oh, Gyubok Lee, Jong Hak Moon, Seng Chan You(+5 more)

Figure 1 for Publicly Shareable Clinical Large Language Model Built on Synthetic Clinical Notes

Figure 2 for Publicly Shareable Clinical Large Language Model Built on Synthetic Clinical Notes

Figure 3 for Publicly Shareable Clinical Large Language Model Built on Synthetic Clinical Notes

Figure 4 for Publicly Shareable Clinical Large Language Model Built on Synthetic Clinical Notes

Abstract:The development of large language models tailored for handling patients' clinical notes is often hindered by the limited accessibility and usability of these notes due to strict privacy regulations. To address these challenges, we first create synthetic large-scale clinical notes using publicly available case reports extracted from biomedical literature. We then use these synthetic notes to train our specialized clinical large language model, Asclepius. While Asclepius is trained on synthetic data, we assess its potential performance in real-world applications by evaluating it using real clinical notes. We benchmark Asclepius against several other large language models, including GPT-3.5-turbo and other open-source alternatives. To further validate our approach using synthetic notes, we also compare Asclepius with its variants trained on real clinical notes. Our findings convincingly demonstrate that synthetic clinical notes can serve as viable substitutes for real ones when constructing high-performing clinical language models. This conclusion is supported by detailed evaluations conducted by both GPT-4 and medical professionals. All resources including weights, codes, and data used in the development of Asclepius are made publicly accessible for future research.

* https://github.com/starmpcc/Asclepius

Via

Access Paper or Ask Questions

UniHPF : Universal Healthcare Predictive Framework with Zero Domain Knowledge

Nov 15, 2022

Kyunghoon Hur, Jungwoo Oh, Junu Kim, Jiyoun Kim, Min Jae Lee, Eunbyeol Cho, Seong-Eun Moon, Young-Hak Kim, Edward Choi

Figure 1 for UniHPF : Universal Healthcare Predictive Framework with Zero Domain Knowledge

Figure 2 for UniHPF : Universal Healthcare Predictive Framework with Zero Domain Knowledge

Figure 3 for UniHPF : Universal Healthcare Predictive Framework with Zero Domain Knowledge

Figure 4 for UniHPF : Universal Healthcare Predictive Framework with Zero Domain Knowledge

Abstract:Despite the abundance of Electronic Healthcare Records (EHR), its heterogeneity restricts the utilization of medical data in building predictive models. To address this challenge, we propose Universal Healthcare Predictive Framework (UniHPF), which requires no medical domain knowledge and minimal pre-processing for multiple prediction tasks. Experimental results demonstrate that UniHPF is capable of building large-scale EHR models that can process any form of medical data from distinct EHR systems. We believe that our findings can provide helpful insights for further research on the multi-source learning of EHRs.

* Extended Abstract presented at Machine Learning for Health (ML4H) symposium 2022, November 28th, 2022, New Orleans, United States & Virtual, http://www.ml4h.cc, 19 pages(main paper 6 pages). arXiv admin note: substantial text overlap with arXiv:2207.09858

Via

Access Paper or Ask Questions

Universal EHR Federated Learning Framework

Nov 14, 2022

Junu Kim, Kyunghoon Hur, Seongjun Yang, Edward Choi

Abstract:Federated learning (FL) is the most practical multi-source learning method for electronic healthcare records (EHR). Despite its guarantee of privacy protection, the wide application of FL is restricted by two large challenges: the heterogeneous EHR systems, and the non-i.i.d. data characteristic. A recent research proposed a framework that unifies heterogeneous EHRs, named UniHPF. We attempt to address both the challenges simultaneously by combining UniHPF and FL. Our study is the first approach to unify heterogeneous EHRs into a single FL framework. This combination provides an average of 3.4% performance gain compared to local learning. We believe that our framework is practically applicable in the real-world FL.

* Extended Abstract presented at Machine Learning for Health (ML4H) symposium 2022, November 28th, 2022, New Orleans, United States & Virtual, http://www.ml4h.cc, 6 pages

Via

Access Paper or Ask Questions