Abstract:A medical AI system's generalizability describes the continuity of its performance acquired from varying geographic, historical, and methodologic settings. Previous literature on this topic has mostly focused on "how" to achieve high generalizability with limited success. Instead, we aim to understand "when" the generalizability is achieved: Our study presents a medical AI system that could estimate its generalizability status for unseen data on-the-fly. We introduce a latent space mapping (LSM) approach utilizing Frechet distance loss to force the underlying training data distribution into a multivariate normal distribution. During the deployment, a given test data's LSM distribution is processed to detect its deviation from the forced distribution; hence, the AI system could predict its generalizability status for any previously unseen data set. If low model generalizability is detected, then the user is informed by a warning message. While the approach is applicable for most classification deep neural networks, we demonstrate its application to a brain metastases (BM) detector for T1-weighted contrast-enhanced (T1c) 3D MRI. The BM detection model was trained using 175 T1c studies acquired internally, and tested using (1) 42 internally and (2) 72 externally acquired exams from the publicly distributed Brain Mets dataset provided by the Stanford University School of Medicine. Generalizability scores, false positive (FP) rates, and sensitivities of the BM detector were computed for the test datasets. The model predicted its generalizability to be low for 31% of the testing data, where it produced (1) ~13.5 FPs at 76.1% BM detection sensitivity for the low and (2) ~10.5 FPs at 89.2% BM detection sensitivity for the high generalizability groups respectively. The results suggest that the proposed formulation enables a model to predict its generalizability for unseen data.
Abstract:The detection of brain metastases (BM) in their early stages could have a positive impact on the outcome of cancer patients. We previously developed a framework for detecting small BM (with diameters of less than 15mm) in T1-weighted Contrast-Enhanced 3D Magnetic Resonance images (T1c) to assist medical experts in this time-sensitive and high-stakes task. The framework utilizes a dedicated convolutional neural network (CNN) trained using labeled T1c data, where the ground truth BM segmentations were provided by a radiologist. This study aims to advance the framework with a noisy student-based self-training strategy to make use of a large corpus of unlabeled T1c data (i.e., data without BM segmentations or detections). Accordingly, the work (1) describes the student and teacher CNN architectures, (2) presents data and model noising mechanisms, and (3) introduces a novel pseudo-labeling strategy factoring in the learned BM detection sensitivity of the framework. Finally, it describes a semi-supervised learning strategy utilizing these components. We performed the validation using 217 labeled and 1247 unlabeled T1c exams via 2-fold cross-validation. The framework utilizing only the labeled exams produced 9.23 false positives for 90% BM detection sensitivity; whereas, the framework using the introduced learning strategy led to ~9% reduction in false detections (i.e., 8.44) for the same sensitivity level. Furthermore, while experiments utilizing 75% and 50% of the labeled datasets resulted in algorithm performance degradation (12.19 and 13.89 false positives respectively), the impact was less pronounced with the noisy student-based training strategy (10.79 and 12.37 false positives respectively).
Abstract:Early detection of brain metastases (BM) is one of the determining factors for the successful treatment of patients with cancer; however, the accurate detection of small BM lesions (< 15mm) remains a challenging task. We previously described a framework for the detection of small BM in single-sequence gadolinium-enhanced T1-weighted 3D MRI datasets. It combined classical image processing (IP) with a dedicated convolutional neural network, taking approximately 30 seconds to process each dataset due to computation-intensive IP stages. To overcome the speed limitation, this study aims to reformulate the framework via an augmented pair of CNNs (eliminating the IP) to reduce the processing times while preserving the BM detection performance. Our previous implementation of the BM detection algorithm utilized Laplacian of Gaussians (LoG) for the candidate selection portion of the solution. In this study, we introduce a novel BM candidate detection CNN (cdCNN) to replace this classical IP stage. The network is formulated to have (1) a similar receptive field as the LoG method, and (2) a bias for the detection of BM lesion loci. The proposed CNN is later augmented with a classification CNN to perform the BM detection task. The cdCNN achieved 97.4% BM detection sensitivity when producing 60K candidates per 3D MRI dataset, while the LoG achieved 96.5% detection sensitivity with 73K candidates. The augmented BM detection framework generated on average 9.20 false-positive BM detections per patient for 90% sensitivity, which is comparable with our previous results. However, it processes each 3D data in 1.9 seconds, presenting a 93.5% reduction in the computation time.
Abstract:Coronary Computed Tomography Angiography (CCTA) evaluation of chest-pain patients in an Emergency Department (ED) is considered appropriate. While a negative CCTA interpretation supports direct patient discharge from an ED, labor-intensive analyses are required, with accuracy in jeopardy from distractions. We describe the development of an Artificial Intelligence (AI) algorithm and workflow for assisting interpreting physicians in CCTA screening for the absence of coronary atherosclerosis. The two-phase approach consisted of (1) Phase 1 - focused on the development and preliminary testing of an algorithm for vessel-centerline extraction classification in a balanced study population (n = 500 with 50% disease prevalence) derived by retrospective random case selection; and (2) Phase 2 - concerned with simulated-clinical Trialing of the developed algorithm on a per-case basis in a more real-world study population (n = 100 with 28% disease prevalence) from an ED chest-pain series. This allowed pre-deployment evaluation of the AI-based CCTA screening application which provides a vessel-by-vessel graphic display of algorithm inference results integrated into a clinically capable viewer. Algorithm performance evaluation used Area Under the Receiver-Operating-Characteristic Curve (AUC-ROC); confusion matrices reflected ground-truth vs AI determinations. The vessel-based algorithm demonstrated strong performance with AUC-ROC = 0.96. In both Phase 1 and Phase 2, independent of disease prevalence differences, negative predictive values at the case level were very high at 95%. The rate of completion of the algorithm workflow process (96% with inference results in 55-80 seconds) in Phase 2 depended on adequate image quality. There is potential for this AI application to assist in CCTA interpretation to help extricate atherosclerosis from chest-pain presentations.