Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

David Hein

Large Language Models for Medical OSCE Assessment: A Novel Approach to Transcript Analysis

Oct 11, 2024

Ameer Hamza Shakur, Michael J. Holcomb, David Hein, Shinyoung Kang, Thomas O. Dalton, Krystle K. Campbell, Daniel J. Scott, Andrew R. Jamieson

Figure 1 for Large Language Models for Medical OSCE Assessment: A Novel Approach to Transcript Analysis

Figure 2 for Large Language Models for Medical OSCE Assessment: A Novel Approach to Transcript Analysis

Figure 3 for Large Language Models for Medical OSCE Assessment: A Novel Approach to Transcript Analysis

Figure 4 for Large Language Models for Medical OSCE Assessment: A Novel Approach to Transcript Analysis

Abstract:Grading Objective Structured Clinical Examinations (OSCEs) is a time-consuming and expensive process, traditionally requiring extensive manual effort from human experts. In this study, we explore the potential of Large Language Models (LLMs) to assess skills related to medical student communication. We analyzed 2,027 video-recorded OSCE examinations from the University of Texas Southwestern Medical Center (UTSW), spanning four years (2019-2022), and several different medical cases or "stations." Specifically, our focus was on evaluating students' ability to summarize patients' medical history: we targeted the rubric item 'did the student summarize the patients' medical history?' from the communication skills rubric. After transcribing speech audio captured by OSCE videos using Whisper-v3, we studied the performance of various LLM-based approaches for grading students on this summarization task based on their examination transcripts. Using various frontier-level open-source and proprietary LLMs, we evaluated different techniques such as zero-shot chain-of-thought prompting, retrieval augmented generation, and multi-model ensemble methods. Our results show that frontier LLM models like GPT-4 achieved remarkable alignment with human graders, demonstrating a Cohen's kappa agreement of 0.88 and indicating strong potential for LLM-based OSCE grading to augment the current grading process. Open-source models also showed promising results, suggesting potential for widespread, cost-effective deployment. Further, we present a failure analysis identifying conditions where LLM grading may be less reliable in this context and recommend best practices for deploying LLMs in medical education settings.

Via

Access Paper or Ask Questions

Recurrence-Free Survival Prediction for Anal Squamous Cell Carcinoma Chemoradiotherapy using Planning CT-based Radiomics Model

Sep 05, 2023

Shanshan Tang, Kai Wang, David Hein, Gloria Lin, Nina N. Sanford, Jing Wang

Figure 1 for Recurrence-Free Survival Prediction for Anal Squamous Cell Carcinoma Chemoradiotherapy using Planning CT-based Radiomics Model

Figure 2 for Recurrence-Free Survival Prediction for Anal Squamous Cell Carcinoma Chemoradiotherapy using Planning CT-based Radiomics Model

Figure 3 for Recurrence-Free Survival Prediction for Anal Squamous Cell Carcinoma Chemoradiotherapy using Planning CT-based Radiomics Model

Figure 4 for Recurrence-Free Survival Prediction for Anal Squamous Cell Carcinoma Chemoradiotherapy using Planning CT-based Radiomics Model

Abstract:Objectives: Approximately 30% of non-metastatic anal squamous cell carcinoma (ASCC) patients will experience recurrence after chemoradiotherapy (CRT), and currently available clinical variables are poor predictors of treatment response. We aimed to develop a model leveraging information extracted from radiation pretreatment planning CT to predict recurrence-free survival (RFS) in ASCC patients after CRT. Methods: Radiomics features were extracted from planning CT images of 96 ASCC patients. Following pre-feature selection, the optimal feature set was selected via step-forward feature selection with a multivariate Cox proportional hazard model. The RFS prediction was generated from a radiomics-clinical combined model based on an optimal feature set with five repeats of five-fold cross validation. The risk stratification ability of the proposed model was evaluated with Kaplan-Meier analysis. Results: Shape- and texture-based radiomics features significantly predicted RFS. Compared to a clinical-only model, radiomics-clinical combined model achieves better performance in the testing cohort with higher C-index (0.80 vs 0.73) and AUC (0.84 vs 0.79 for 1-year RFS, 0.84 vs 0.78 for 2-year RFS, and 0.86 vs 0.83 for 3-year RFS), leading to distinctive high- and low-risk of recurrence groups (p<0.001). Conclusions: A treatment planning CT based radiomics and clinical combined model had improved prognostic performance in predicting RFS for ASCC patients treated with CRT as compared to a model using clinical features only.

Via

Access Paper or Ask Questions