Abstract:Purpose: Pulmonary embolism (PE) is a significant cause of mortality in the United States. The objective of this study is to implement deep learning (DL) models using Computed Tomography Pulmonary Angiography (CTPA), clinical data, and PE Severity Index (PESI) scores to predict PE mortality. Materials and Methods: 918 patients (median age 64 years, range 13-99 years, 52% female) with 3,978 CTPAs were identified via retrospective review across three institutions. To predict survival, an AI model was used to extract disease-related imaging features from CTPAs. Imaging features and/or clinical variables were then incorporated into DL models to predict survival outcomes. Four models were developed as follows: (1) using CTPA imaging features only; (2) using clinical variables only; (3) multimodal, integrating both CTPA and clinical variables; and (4) multimodal fused with calculated PESI score. Performance and contribution from each modality were evaluated using concordance index (c-index) and Net Reclassification Improvement, respectively. Performance was compared to PESI predictions using the Wilcoxon signed-rank test. Kaplan-Meier analysis was performed to stratify patients into high- and low-risk groups. Additional factor-risk analysis was conducted to account for right ventricular (RV) dysfunction. Results: For both data sets, the PESI-fused and multimodal models achieved higher c-indices than PESI alone. Following stratification of patients into high- and low-risk groups by multimodal and PESI-fused models, mortality outcomes differed significantly (both p<0.001). A strong correlation was found between high-risk grouping and RV dysfunction. Conclusions: Multiomic DL models incorporating CTPA features, clinical data, and PESI achieved higher c-indices than PESI alone for PE survival prediction.
Abstract:The automated generation of imaging reports proves invaluable in alleviating the workload of radiologists. A clinically applicable reports generation algorithm should demonstrate its effectiveness in producing reports that accurately describe radiology findings and attend to patient-specific indications. In this paper, we introduce a novel method, \textbf{S}tructural \textbf{E}ntities extraction and patient indications \textbf{I}ncorporation (SEI) for chest X-ray report generation. Specifically, we employ a structural entities extraction (SEE) approach to eliminate presentation-style vocabulary in reports and improve the quality of factual entity sequences. This reduces the noise in the following cross-modal alignment module by aligning X-ray images with factual entity sequences in reports, thereby enhancing the precision of cross-modal alignment and further aiding the model in gradient-free retrieval of similar historical cases. Subsequently, we propose a cross-modal fusion network to integrate information from X-ray images, similar historical cases, and patient-specific indications. This process allows the text decoder to attend to discriminative features of X-ray images, assimilate historical diagnostic information from similar cases, and understand the examination intention of patients. This, in turn, assists in triggering the text decoder to produce high-quality reports. Experiments conducted on MIMIC-CXR validate the superiority of SEI over state-of-the-art approaches on both natural language generation and clinical efficacy metrics.
Abstract:In response to the worldwide COVID-19 pandemic, advanced automated technologies have emerged as valuable tools to aid healthcare professionals in managing an increased workload by improving radiology report generation and prognostic analysis. This study proposes Multi-modality Regional Alignment Network (MRANet), an explainable model for radiology report generation and survival prediction that focuses on high-risk regions. By learning spatial correlation in the detector, MRANet visually grounds region-specific descriptions, providing robust anatomical regions with a completion strategy. The visual features of each region are embedded using a novel survival attention mechanism, offering spatially and risk-aware features for sentence encoding while maintaining global coherence across tasks. A cross LLMs alignment is employed to enhance the image-to-text transfer process, resulting in sentences rich with clinical detail and improved explainability for radiologist. Multi-center experiments validate both MRANet's overall performance and each module's composition within the model, encouraging further advancements in radiology report generation research emphasizing clinical interpretation and trustworthiness in AI models applied to medical studies. The code is available at https://github.com/zzs95/MRANet.
Abstract:The COVID-19 pandemic has strained global public health, necessitating accurate diagnosis and intervention to control disease spread and reduce mortality rates. This paper introduces an interpretable deep survival prediction model designed specifically for improved understanding and trust in COVID-19 prognosis using chest X-ray (CXR) images. By integrating a large-scale pretrained image encoder, Risk-specific Grad-CAM, and anatomical region detection techniques, our approach produces regional interpretable outcomes that effectively capture essential disease features while focusing on rare but critical abnormal regions. Our model's predictive results provide enhanced clarity and transparency through risk area localization, enabling clinicians to make informed decisions regarding COVID-19 diagnosis with better understanding of prognostic insights. We evaluate the proposed method on a multi-center survival dataset and demonstrate its effectiveness via quantitative and qualitative assessments, achieving superior C-indexes (0.764 and 0.727) and time-dependent AUCs (0.799 and 0.691). These results suggest that our explainable deep survival prediction model surpasses traditional survival analysis methods in risk prediction, improving interpretability for clinical decision making and enhancing AI system trustworthiness.
Abstract:Medical image segmentation based on deep learning often fails when deployed on images from a different domain. The domain adaptation methods aim to solve domain-shift challenges, but still face some problems. The transfer learning methods require annotation on the target domain, and the generative unsupervised domain adaptation (UDA) models ignore domain-specific representations, whose generated quality highly restricts segmentation performance. In this study, we propose a novel Structure-Modal Constrained (SMC) UDA framework based on a discriminative paradigm and introduce edge structure as a bridge between domains. The proposed multi-modal learning backbone distills structure information from image texture to distinguish domain-invariant edge structure. With the structure-constrained self-learning and progressive ROI, our methods segment the kidney by locating the 3D spatial structure of the edge. We evaluated SMC-UDA on public renal segmentation datasets, adapting from the labeled source domain (CT) to the unlabeled target domain (CT/MRI). The experiments show that our proposed SMC-UDA has a strong generalization and outperforms generative UDA methods.
Abstract:Medical image segmentation is usually regarded as one of the most important intermediate steps in clinical situations and medical imaging research. Thus, accurately assessing the segmentation quality of the automatically generated predictions is essential for guaranteeing the reliability of the results of the computer-assisted diagnosis (CAD). Many researchers apply neural networks to train segmentation quality regression models to estimate the segmentation quality of a new data cohort without labeled ground truth. Recently, a novel idea is proposed that transforming the segmentation quality assessment (SQA) problem intothe pixel-wise error map prediction task in the form of segmentation. However, the simple application of vanilla segmentation structures in medical image fails to detect some small and thin error regions of the auto-generated masks with complex anatomical structures. In this paper, we propose collaborative boundaryaware context encoding networks called AEP-Net for error prediction task. Specifically, we propose a collaborative feature transformation branch for better feature fusion between images and masks, and precise localization of error regions. Further, we propose a context encoding module to utilize the global predictor from the error map to enhance the feature representation and regularize the networks. We perform experiments on IBSR v2.0 dataset and ACDC dataset. The AEP-Net achieves an average DSC of 0.8358, 0.8164 for error prediction task,and shows a high Pearson correlation coefficient of 0.9873 between the actual segmentation accuracy and the predicted accuracy inferred from the predicted error map on IBSR v2.0 dataset, which verifies the efficacy of our AEP-Net.
Abstract:3D image segmentation is one of the most important and ubiquitous problems in medical image processing. It provides detailed quantitative analysis for accurate disease diagnosis, abnormal detection, and classification. Currently deep learning algorithms are widely used in medical image segmentation, most algorithms trained models with full annotated datasets. However, obtaining medical image datasets is very difficult and expensive, and full annotation of 3D medical image is a monotonous and time-consuming work. Partially labelling informative slices in 3D images will be a great relief of manual annotation. Sample selection strategies based on active learning have been proposed in the field of 2D image, but few strategies focus on 3D images. In this paper, we propose a sparse annotation strategy based on attention-guided active learning for 3D medical image segmentation. Attention mechanism is used to improve segmentation accuracy and estimate the segmentation accuracy of each slice. The comparative experiments with three different strategies using datasets from the developing human connectome project (dHCP) show that, our strategy only needs 15% to 20% annotated slices in brain extraction task and 30% to 35% annotated slices in tissue segmentation task to achieve comparative results as full annotation.
Abstract:Cephalometric tracing method is usually used in orthodontic diagnosis and treat-ment planning. In this paper, we propose a deep learning based framework to au-tomatically detect anatomical landmarks in cephalometric X-ray images. We train the deep encoder-decoder for landmark detection, and combine global landmark configuration with local high-resolution feature responses. The proposed frame-work is based on 2-stage u-net, regressing the multi-channel heatmaps for land-mark detection. In this framework, we embed attention mechanism with global stage heatmaps, guiding the local stage inferring, to regress the local heatmap patches in a high resolution. Besides, the Expansive Exploration strategy im-proves robustness while inferring, expanding the searching scope without in-creasing model complexity. We have evaluated our framework in the most wide-ly-used public dataset of landmark detection in cephalometric X-ray images. With less computation and manually tuning, our framework achieves state-of-the-art results.