Abstract:Online symptom checkers have significant potential to improve patient care, however their reliability and accuracy remain variable. We hypothesised that an artificial intelligence (AI) powered triage and diagnostic system would compare favourably with human doctors with respect to triage and diagnostic accuracy. We performed a prospective validation study of the accuracy and safety of an AI powered triage and diagnostic system. Identical cases were evaluated by both an AI system and human doctors. Differential diagnoses and triage outcomes were evaluated by an independent judge, who was blinded from knowing the source (AI system or human doctor) of the outcomes. Independently of these cases, vignettes from publicly available resources were also assessed to provide a benchmark to previous studies and the diagnostic component of the MRCGP exam. Overall we found that the Babylon AI powered Triage and Diagnostic System was able to identify the condition modelled by a clinical vignette with accuracy comparable to human doctors (in terms of precision and recall). In addition, we found that the triage advice recommended by the AI System was, on average, safer than that of human doctors, when compared to the ranges of acceptable triage provided by independent expert judges, with only a minimal reduction in appropriateness.
Abstract:Prior to seeking professional medical care it is increasingly common for patients to use online resources such as automated symptom checkers. Many such systems attempt to provide a differential diagnosis based on the symptoms elucidated from the user, which may lead to anxiety if life or limb-threatening conditions are part of the list, a phenomenon termed 'cyberchondria' [1]. Systems that provide advice on where to seek help, rather than a diagnosis, are equally popular, and in our view provide the most useful information. In this technical report we describe how such a triage system can be modelled computationally, how medical insights can be translated into triage flows, and how such systems can be validated and tested. We present babylon check, our commercially deployed automated triage system, as a case study, and illustrate its performance in a large, semi-naturalistic deployment study.