Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Oriel Perets

CUPCase: Clinically Uncommon Patient Cases and Diagnoses Dataset

Mar 08, 2025

Oriel Perets, Ofir Ben Shoham, Nir Grinberg, Nadav Rappoport

Figure 1 for CUPCase: Clinically Uncommon Patient Cases and Diagnoses Dataset

Figure 2 for CUPCase: Clinically Uncommon Patient Cases and Diagnoses Dataset

Figure 3 for CUPCase: Clinically Uncommon Patient Cases and Diagnoses Dataset

Figure 4 for CUPCase: Clinically Uncommon Patient Cases and Diagnoses Dataset

Abstract:Medical benchmark datasets significantly contribute to developing Large Language Models (LLMs) for medical knowledge extraction, diagnosis, summarization, and other uses. Yet, current benchmarks are mainly derived from exam questions given to medical students or cases described in the medical literature, lacking the complexity of real-world patient cases that deviate from classic textbook abstractions. These include rare diseases, uncommon presentations of common diseases, and unexpected treatment responses. Here, we construct Clinically Uncommon Patient Cases and Diagnosis Dataset (CUPCase) based on 3,562 real-world case reports from BMC, including diagnoses in open-ended textual format and as multiple-choice options with distractors. Using this dataset, we evaluate the ability of state-of-the-art LLMs, including both general-purpose and Clinical LLMs, to identify and correctly diagnose a patient case, and test models' performance when only partial information about cases is available. Our findings show that general-purpose GPT-4o attains the best performance in both the multiple-choice task (average accuracy of 87.9%) and the open-ended task (BERTScore F1 of 0.764), outperforming several LLMs with a focus on the medical domain such as Meditron-70B and MedLM-Large. Moreover, GPT-4o was able to maintain 87% and 88% of its performance with only the first 20% of tokens of the case presentation in multiple-choice and free text, respectively, highlighting the potential of LLMs to aid in early diagnosis in real-world cases. CUPCase expands our ability to evaluate LLMs for clinical decision support in an open and reproducible manner.

* Accepted to AAAI 2025

Via

Access Paper or Ask Questions

DSF-GAN: DownStream Feedback Generative Adversarial Network

Mar 27, 2024

Oriel Perets, Nadav Rappoport

Figure 1 for DSF-GAN: DownStream Feedback Generative Adversarial Network

Abstract:Utility and privacy are two crucial measurements of the quality of synthetic tabular data. While significant advancements have been made in privacy measures, generating synthetic samples with high utility remains challenging. To enhance the utility of synthetic samples, we propose a novel architecture called the DownStream Feedback Generative Adversarial Network (DSF-GAN). This approach incorporates feedback from a downstream prediction model during training to augment the generator's loss function with valuable information. Thus, DSF-GAN utilizes a downstream prediction task to enhance the utility of synthetic samples. To evaluate our method, we tested it using two popular datasets. Our experiments demonstrate improved model performance when training on synthetic samples generated by DSF-GAN, compared to those generated by the same GAN architecture without feedback. The evaluation was conducted on the same validation set comprising real samples. All code and datasets used in this research will be made openly available for ease of reproduction.

Via

Access Paper or Ask Questions

Ensemble Synthetic EHR Generation for Increasing Subpopulation Model's Performance

May 25, 2023

Oriel Perets, Nadav Rappoport

Figure 1 for Ensemble Synthetic EHR Generation for Increasing Subpopulation Model's Performance

Figure 2 for Ensemble Synthetic EHR Generation for Increasing Subpopulation Model's Performance

Figure 3 for Ensemble Synthetic EHR Generation for Increasing Subpopulation Model's Performance

Figure 4 for Ensemble Synthetic EHR Generation for Increasing Subpopulation Model's Performance

Abstract:Electronic health records (EHR) often contain different rates of representation of certain subpopulations (SP). Factors like patient demographics, clinical condition prevalence, and medical center type contribute to this underrepresentation. Consequently, when training machine learning models on such datasets, the models struggle to generalize well and perform poorly on underrepresented SPs. To address this issue, we propose a novel ensemble framework that utilizes generative models. Specifically, we train a GAN-based synthetic data generator for each SP and incorporate synthetic samples into each SP training set. Ultimately, we train SP-specific prediction models. To properly evaluate this method, we design an evaluation pipeline with 2 real-world use case datasets, queried from the MIMIC database. Our approach shows increased model performance over underrepresented SPs. Our code and models are given as supplementary and will be made available on a public repository.

Via

Access Paper or Ask Questions