Abstract:Vertebral compression fractures (VCFs) are a common and potentially serious consequence of osteoporosis. Yet, they often remain undiagnosed. Opportunistic screening, which involves automated analysis of medical imaging data acquired primarily for other purposes, is a cost-effective method to identify undiagnosed VCFs. In high-stakes scenarios like opportunistic medical diagnosis, model interpretability is a key factor for the adoption of AI recommendations. Rule-based methods are inherently explainable and closely align with clinical guidelines, but they are not immediately applicable to high-dimensional data such as CT scans. To address this gap, we introduce a neurosymbolic approach for VCF detection in CT volumes. The proposed model combines deep learning (DL) for vertebral segmentation with a shape-based algorithm (SBA) that analyzes vertebral height distributions in salient anatomical regions. This allows for the definition of a rule set over the height distributions to detect VCFs. Evaluation of VerSe19 dataset shows that our method achieves an accuracy of 96% and a sensitivity of 91% in VCF detection. In comparison, a black box model, DenseNet, achieved an accuracy of 95% and sensitivity of 91% in the same dataset. Our results demonstrate that our intrinsically explainable approach can match or surpass the performance of black box deep neural networks while providing additional insights into why a prediction was made. This transparency can enhance clinician's trust thus, supporting more informed decision-making in VCF diagnosis and treatment planning.
Abstract:Natural language offers a convenient, flexible interface for controlling robotic C-arm X-ray systems, making advanced functionality and controls accessible. However, enabling language interfaces requires specialized AI models that interpret X-ray images to create a semantic representation for reasoning. The fixed outputs of such AI models limit the functionality of language controls. Incorporating flexible, language-aligned AI models prompted through language enables more versatile interfaces for diverse tasks and procedures. Using a language-aligned foundation model for X-ray image segmentation, our system continually updates a patient digital twin based on sparse reconstructions of desired anatomical structures. This supports autonomous capabilities such as visualization, patient-specific viewfinding, and automatic collimation from novel viewpoints, enabling commands 'Focus in on the lower lumbar vertebrae.' In a cadaver study, users visualized, localized, and collimated structures across the torso using verbal commands, achieving 84% end-to-end success. Post hoc analysis of randomly oriented images showed our patient digital twin could localize 35 commonly requested structures to within 51.68 mm, enabling localization and isolation from arbitrary orientations. Our results demonstrate how intelligent robotic X-ray systems can incorporate physicians' expressed intent directly. While existing foundation models for intra-operative X-ray analysis exhibit failure modes, as they improve, they can facilitate highly flexible, intelligent robotic C-arms.