Abstract:Objectives: To assess the use of artificial intelligence-based software in ruling out chest X-ray cases, with no significant findings in a primary health care setting. Methods: In this retrospective study, a commercially available artificial intelligence (AI) software was used to analyse 10 000 chest X-rays of Finnish primary health care patients. In studies with a mismatch between an AI normal report and the original radiologist report, a consensus read by two board-certified radiologists was conducted to make the final diagnosis. Results: After the exclusion of cases not meeting the study criteria, 9579 cases were analysed by AI. Of these cases, 4451 were considered normal in the original radiologist report and 4644 after the consensus reading. The number of cases correctly found nonsignificant by AI was 1692 (17.7% of all studies and 36.4% of studies with no significant findings). After the consensus read, there were nine confirmed false-negative studies. These studies included four cases of slightly enlarged heart size, four cases of slightly increased pulmonary opacification and one case with a small unilateral pleural effusion. This gives the AI a sensitivity of 99.8% (95% CI= 99.65-99.92) and specificity of 36.4 % (95% CI= 35.05-37.84) for recognising significant pathology on a chest X-ray. Conclusions: AI was able to correctly rule out 36.4% of chest X-rays with no significant findings of primary health care patients, with a minimal number of false negatives that would lead to effectively no compromise on patient safety. No critical findings were missed by the software.
Abstract:Wrist Fracture is the most common type of fracture with a high incidence rate. Conventional radiography (i.e. X-ray imaging) is used for wrist fracture detection routinely, but occasionally fracture delineation poses issues and an additional confirmation by computed tomography (CT) is needed for diagnosis. Recent advances in the field of Deep Learning (DL), a subfield of Artificial Intelligence (AI), have shown that wrist fracture detection can be automated using Convolutional Neural Networks. However, previous studies did not pay close attention to the difficult cases which can only be confirmed via CT imaging. In this study, we have developed and analyzed a state-of-the-art DL-based pipeline for wrist (distal radius) fracture detection -- DeepWrist, and evaluated it against one general population test set, and one challenging test set comprising only cases requiring confirmation by CT. Our results reveal that a typical state-of-the-art approach, such as DeepWrist, while having a near-perfect performance on the general independent test set, has a substantially lower performance on the challenging test set -- average precision of 0.99 (0.99-0.99) vs 0.64 (0.46-0.83), respectively. Similarly, the area under the ROC curve was of 0.99 (0.98-0.99) vs 0.84 (0.72-0.93), respectively. Our findings highlight the importance of a meticulous analysis of DL-based models before clinical use, and unearth the need for more challenging settings for testing medical AI systems.