Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:MedQA-CS: Benchmarking Large Language Models Clinical Skills Using an AI-SCE Framework

Oct 02, 2024

Zonghai Yao, Zihao Zhang, Chaolong Tang, Xingyu Bian, Youxia Zhao, Zhichao Yang, Junda Wang, Huixue Zhou, Won Seok Jang, Feiyun Ouyang(+1 more)

Figure 1 for MedQA-CS: Benchmarking Large Language Models Clinical Skills Using an AI-SCE Framework

Figure 2 for MedQA-CS: Benchmarking Large Language Models Clinical Skills Using an AI-SCE Framework

Figure 3 for MedQA-CS: Benchmarking Large Language Models Clinical Skills Using an AI-SCE Framework

Figure 4 for MedQA-CS: Benchmarking Large Language Models Clinical Skills Using an AI-SCE Framework

Share this with someone who'll enjoy it:

Abstract:Artificial intelligence (AI) and large language models (LLMs) in healthcare require advanced clinical skills (CS), yet current benchmarks fail to evaluate these comprehensively. We introduce MedQA-CS, an AI-SCE framework inspired by medical education's Objective Structured Clinical Examinations (OSCEs), to address this gap. MedQA-CS evaluates LLMs through two instruction-following tasks, LLM-as-medical-student and LLM-as-CS-examiner, designed to reflect real clinical scenarios. Our contributions include developing MedQA-CS, a comprehensive evaluation framework with publicly available data and expert annotations, and providing the quantitative and qualitative assessment of LLMs as reliable judges in CS evaluation. Our experiments show that MedQA-CS is a more challenging benchmark for evaluating clinical skills than traditional multiple-choice QA benchmarks (e.g., MedQA). Combined with existing benchmarks, MedQA-CS enables a more comprehensive evaluation of LLMs' clinical capabilities for both open- and closed-source LLMs.

View paper on

Share this with someone who'll enjoy it:

Title:MedQA-CS: Benchmarking Large Language Models Clinical Skills Using an AI-SCE Framework

Paper and Code