Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Appraising the Potential Uses and Harms of LLMs for Medical Systematic Reviews

May 22, 2023

Hye Sun Yun, Iain J. Marshall, Thomas Trikalinos, Byron C. Wallace

Figure 1 for Appraising the Potential Uses and Harms of LLMs for Medical Systematic Reviews

Figure 2 for Appraising the Potential Uses and Harms of LLMs for Medical Systematic Reviews

Figure 3 for Appraising the Potential Uses and Harms of LLMs for Medical Systematic Reviews

Figure 4 for Appraising the Potential Uses and Harms of LLMs for Medical Systematic Reviews

Share this with someone who'll enjoy it:

Abstract:Medical systematic reviews are crucial for informing clinical decision making and healthcare policy. But producing such reviews is onerous and time-consuming. Thus, high-quality evidence synopses are not available for many questions and may be outdated even when they are available. Large language models (LLMs) are now capable of generating long-form texts, suggesting the tantalizing possibility of automatically generating literature reviews on demand. However, LLMs sometimes generate inaccurate (and potentially misleading) texts by hallucinating or omitting important information. In the healthcare context, this may render LLMs unusable at best and dangerous at worst. Most discussion surrounding the benefits and risks of LLMs have been divorced from specific applications. In this work, we seek to qualitatively characterize the potential utility and risks of LLMs for assisting in production of medical evidence reviews. We conducted 16 semi-structured interviews with international experts in systematic reviews, grounding discussion in the context of generating evidence reviews. Domain experts indicated that LLMs could aid writing reviews, as a tool for drafting or creating plain language summaries, generating templates or suggestions, distilling information, crosschecking, and synthesizing or interpreting text inputs. But they also identified issues with model outputs and expressed concerns about potential downstream harms of confidently composed but inaccurate LLM outputs which might mislead. Other anticipated potential downstream harms included lessened accountability and proliferation of automatically generated reviews that might be of low quality. Informed by this qualitative analysis, we identify criteria for rigorous evaluation of biomedical LLMs aligned with domain expert views.

* 34 pages, 3 figures, 7 tables

View paper on

Share this with someone who'll enjoy it:

Title:Appraising the Potential Uses and Harms of LLMs for Medical Systematic Reviews

Paper and Code