Abstract:We present a comprehensive evaluation of a2z-1, an artificial intelligence (AI) model designed to analyze abdomen-pelvis CT scans for 21 time-sensitive and actionable findings. Our study focuses on rigorous assessment of the model's performance and generalizability. Large-scale retrospective analysis demonstrates an average AUC of 0.931 across 21 conditions. External validation across two distinct health systems confirms consistent performance (AUC 0.923), establishing generalizability to different evaluation scenarios, with notable performance in critical findings such as small bowel obstruction (AUC 0.958) and acute pancreatitis (AUC 0.961). Subgroup analysis shows consistent accuracy across patient sex, age groups, and varied imaging protocols, including different slice thicknesses and contrast administration types. Comparison of high-confidence model outputs to radiologist reports reveals instances where a2z-1 identified overlooked findings, suggesting potential for quality assurance applications.
Abstract:Radiologists face increasing workload pressures amid growing imaging volumes, creating risks of burnout and delayed reporting times. While artificial intelligence (AI) based automated radiology report generation shows promise for reporting workflow optimization, evidence of its real-world impact on clinical accuracy and efficiency remains limited. This study evaluated the effect of draft reports on radiology reporting workflows by conducting a three reader multi-case study comparing standard versus AI-assisted reporting workflows. In both workflows, radiologists reviewed the cases and modified either a standard template (standard workflow) or an AI-generated draft report (AI-assisted workflow) to create the final report. For controlled evaluation, we used GPT-4 to generate simulated AI drafts and deliberately introduced 1-3 errors in half the cases to mimic real AI system performance. The AI-assisted workflow significantly reduced average reporting time from 573 to 435 seconds (p=0.003), without a statistically significant difference in clinically significant errors between workflows. These findings suggest that AI-generated drafts can meaningfully accelerate radiology reporting while maintaining diagnostic accuracy, offering a practical solution to address mounting workload challenges in clinical practice.
Abstract:During the COVID-19 pandemic, rapid and accurate triage of patients at the emergency department is critical to inform decision-making. We propose a data-driven approach for automatic prediction of deterioration risk using a deep neural network that learns from chest X-ray images, and a gradient boosting model that learns from routine clinical variables. Our AI prognosis system, trained using data from 3,661 patients, achieves an AUC of 0.786 (95% CI: 0.742-0.827) when predicting deterioration within 96 hours. The deep neural network extracts informative areas of chest X-ray images to assist clinicians in interpreting the predictions, and performs comparably to two radiologists in a reader study. In order to verify performance in a real clinical setting, we silently deployed a preliminary version of the deep neural network at NYU Langone Health during the first wave of the pandemic, which produced accurate predictions in real-time. In summary, our findings demonstrate the potential of the proposed system for assisting front-line physicians in the triage of COVID-19 patients.