Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:oRetrieval Augmented Generation for 10 Large Language Models and its Generalizability in Assessing Medical Fitness

Oct 11, 2024

Yu He Ke, Liyuan Jin, Kabilan Elangovan, Hairil Rizal Abdullah, Nan Liu, Alex Tiong Heng Sia, Chai Rick Soh, Joshua Yi Min Tung, Jasmine Chiat Ling Ong, Chang-Fu Kuo(+3 more)

Figure 1 for oRetrieval Augmented Generation for 10 Large Language Models and its Generalizability in Assessing Medical Fitness

Figure 2 for oRetrieval Augmented Generation for 10 Large Language Models and its Generalizability in Assessing Medical Fitness

Figure 3 for oRetrieval Augmented Generation for 10 Large Language Models and its Generalizability in Assessing Medical Fitness

Figure 4 for oRetrieval Augmented Generation for 10 Large Language Models and its Generalizability in Assessing Medical Fitness

Share this with someone who'll enjoy it:

Abstract:Large Language Models (LLMs) show potential for medical applications but often lack specialized clinical knowledge. Retrieval Augmented Generation (RAG) allows customization with domain-specific information, making it suitable for healthcare. This study evaluates the accuracy, consistency, and safety of RAG models in determining fitness for surgery and providing preoperative instructions. We developed LLM-RAG models using 35 local and 23 international preoperative guidelines and tested them against human-generated responses. A total of 3,682 responses were evaluated. Clinical documents were processed using Llamaindex, and 10 LLMs, including GPT3.5, GPT4, and Claude-3, were assessed. Fourteen clinical scenarios were analyzed, focusing on seven aspects of preoperative instructions. Established guidelines and expert judgment were used to determine correct responses, with human-generated answers serving as comparisons. The LLM-RAG models generated responses within 20 seconds, significantly faster than clinicians (10 minutes). The GPT4 LLM-RAG model achieved the highest accuracy (96.4% vs. 86.6%, p=0.016), with no hallucinations and producing correct instructions comparable to clinicians. Results were consistent across both local and international guidelines. This study demonstrates the potential of LLM-RAG models for preoperative healthcare tasks, highlighting their efficiency, scalability, and reliability.

* arXiv admin note: substantial text overlap with arXiv:2402.01733

View paper on

Share this with someone who'll enjoy it:

Title:oRetrieval Augmented Generation for 10 Large Language Models and its Generalizability in Assessing Medical Fitness

Paper and Code