Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Large Language Models in Healthcare: A Comprehensive Benchmark

Apr 25, 2024

Andrew Liu, Hongjian Zhou, Yining Hua, Omid Rohanian, Lei Clifton, David A. Clifton

Figure 1 for Large Language Models in Healthcare: A Comprehensive Benchmark

Figure 2 for Large Language Models in Healthcare: A Comprehensive Benchmark

Figure 3 for Large Language Models in Healthcare: A Comprehensive Benchmark

Figure 4 for Large Language Models in Healthcare: A Comprehensive Benchmark

Share this with someone who'll enjoy it:

Abstract:The adoption of large language models (LLMs) to assist clinicians has attracted remarkable attention. Existing works mainly adopt the close-ended question-answering task with answer options for evaluation. However, in real clinical settings, many clinical decisions, such as treatment recommendations, involve answering open-ended questions without pre-set options. Meanwhile, existing studies mainly use accuracy to assess model performance. In this paper, we comprehensively benchmark diverse LLMs in healthcare, to clearly understand their strengths and weaknesses. Our benchmark contains seven tasks and thirteen datasets across medical language generation, understanding, and reasoning. We conduct a detailed evaluation of the existing sixteen LLMs in healthcare under both zero-shot and few-shot (i.e., 1,3,5-shot) learning settings. We report the results on five metrics (i.e. matching, faithfulness, comprehensiveness, generalizability, and robustness) that are critical in achieving trust from clinical users. We further invite medical experts to conduct human evaluation.

View paper on

Share this with someone who'll enjoy it:

Title:Large Language Models in Healthcare: A Comprehensive Benchmark

Paper and Code