Abstract:CT report generation (CTRG) aims to automatically generate diagnostic reports for 3D volumes, relieving clinicians' workload and improving patient care. Despite clinical value, existing works fail to effectively incorporate diagnostic information from multiple anatomical views and lack related clinical expertise essential for accurate and reliable diagnosis. To resolve these limitations, we propose a novel Multi-view perception Knowledge-enhanced Tansformer (MvKeTR) to mimic the diagnostic workflow of clinicians. Just as radiologists first examine CT scans from multiple planes, a Multi-View Perception Aggregator (MVPA) with view-aware attention effectively synthesizes diagnostic information from multiple anatomical views. Then, inspired by how radiologists further refer to relevant clinical records to guide diagnostic decision-making, a Cross-Modal Knowledge Enhancer (CMKE) retrieves the most similar reports based on the query volume to incorporate domain knowledge into the diagnosis procedure. Furthermore, instead of traditional MLPs, we employ Kolmogorov-Arnold Networks (KANs) with learnable nonlinear activation functions as the fundamental building blocks of both modules to better capture intricate diagnostic patterns in CT interpretation. Extensive experiments on the public CTRG-Chest-548K dataset demonstrate that our method outpaces prior state-of-the-art models across all metrics.
Abstract:Patient-reported outcomes (PROs) directly collected from cancer patients being treated with radiation therapy play a vital role in assisting clinicians in counseling patients regarding likely toxicities. Precise prediction and evaluation of symptoms or health status associated with PROs are fundamental to enhancing decision-making and planning for the required services and support as patients transition into survivorship. However, the raw PRO data collected from hospitals exhibits some intrinsic challenges such as incomplete item reports and imbalance patient toxicities. To the end, in this study, we explore various machine learning techniques to predict patient outcomes related to health status such as pain levels and sleep discomfort using PRO datasets from a cancer photon/proton therapy center. Specifically, we deploy six advanced machine learning classifiers -- Random Forest (RF), XGBoost, Gradient Boosting (GB), Support Vector Machine (SVM), Multi-Layer Perceptron with Bagging (MLP-Bagging), and Logistic Regression (LR) -- to tackle a multi-class imbalance classification problem across three prevalent cancer types: head and neck, prostate, and breast cancers. To address the class imbalance issue, we employ an oversampling strategy, adjusting the training set sample sizes through interpolations of in-class neighboring samples, thereby augmenting minority classes without deviating from the original skewed class distribution. Our experimental findings across multiple PRO datasets indicate that the RF and XGB methods achieve robust generalization performance, evidenced by weighted AUC and detailed confusion matrices, in categorizing outcomes as mild, intermediate, and severe post-radiation therapy. These results underscore the models' effectiveness and potential utility in clinical settings.
Abstract:Symmetrical NMR spectroscopy constitutes a vital branch of multidimensional NMR spectroscopy, providing a powerful tool for the structural elucidation of biological macromolecules. Non-Uniform Sampling (NUS) serves as an effective strategy for averting the prohibitive acquisition time of multidimensional NMR spectroscopy by only sampling a few points according to NUS sampling schedules and reconstructing missing points via algorithms. However, current sampling schedules are unable to maintain the accurate recovery of cross peaks that are weak but important. In this work, we propose a novel sampling schedule termed as SCPG (Symmetrical Copy Poisson Gap) and employ CS (Compressed Sensing) methods for reconstruction. We theoretically prove that the symmetrical constraint, apart from sparsity, is implicitly implemented when SCPG is combined with CS methods. The simulated and experimental data substantiate the advantage of SCPG over state-of-the-art 2D Woven PG in the NUS reconstruction of symmetrical NMR spectroscopy.
Abstract:Purpose: To develop and evaluate a novel dynamic-convolution-based method called FlexDTI for high-efficient diffusion tensor reconstruction with flexible diffusion encoding gradient schemes. Methods: FlexDTI was developed to achieve high-quality DTI parametric mapping with flexible number and directions of diffusion encoding gradients. The proposed method used dynamic convolution kernels to embed diffusion gradient direction information into feature maps of the corresponding diffusion signal. Besides, our method realized the generalization of a flexible number of diffusion gradient directions by setting the maximum number of input channels of the network. The network was trained and tested using data sets from the Human Connectome Project and a local hospital. Results from FlexDTI and other advanced tensor parameter estimation methods were compared. Results: Compared to other methods, FlexDTI successfully achieves high-quality diffusion tensor-derived variables even if the number and directions of diffusion encoding gradients are variable. It increases peak signal-to-noise ratio (PSNR) by about 10 dB on Fractional Anisotropy (FA) and Mean Diffusivity (MD), compared with the state-of-the-art deep learning method with flexible diffusion encoding gradient schemes. Conclusion: FlexDTI can well learn diffusion gradient direction information to achieve generalized DTI reconstruction with flexible diffusion gradient schemes. Both flexibility and reconstruction quality can be taken into account in this network.
Abstract:Magnetic resonance imaging (MRI) is a principal radiological modality that provides radiation-free, abundant, and diverse information about the whole human body for medical diagnosis, but suffers from prolonged scan time. The scan time can be significantly reduced through k-space undersampling but the introduced artifacts need to be removed in image reconstruction. Although deep learning (DL) has emerged as a powerful tool for image reconstruction in fast MRI, its potential in multiple imaging scenarios remains largely untapped. This is because not only collecting large-scale and diverse realistic training data is generally costly and privacy-restricted, but also existing DL methods are hard to handle the practically inevitable mismatch between training and target data. Here, we present a Physics-Informed Synthetic data learning framework for Fast MRI, called PISF, which is the first to enable generalizable DL for multi-scenario MRI reconstruction using solely one trained model. For a 2D image, the reconstruction is separated into many 1D basic problems and starts with the 1D data synthesis, to facilitate generalization. We demonstrate that training DL models on synthetic data, integrated with enhanced learning techniques, can achieve comparable or even better in vivo MRI reconstruction compared to models trained on a matched realistic dataset, reducing the demand for real-world MRI data by up to 96%. Moreover, our PISF shows impressive generalizability in multi-vendor multi-center imaging. Its excellent adaptability to patients has been verified through 10 experienced doctors' evaluations. PISF provides a feasible and cost-effective way to markedly boost the widespread usage of DL in various fast MRI applications, while freeing from the intractable ethical and practical considerations of in vivo human data acquisitions.
Abstract:Soft-thresholding has been widely used in neural networks. Its basic network structure is a two-layer convolution neural network with soft-thresholding. Due to the network's nature of nonlinearity and nonconvexity, the training process heavily depends on an appropriate initialization of network parameters, resulting in the difficulty of obtaining a globally optimal solution. To address this issue, a convex dual network is designed here. We theoretically analyze the network convexity and numerically confirm that the strong duality holds. This conclusion is further verified in the linear fitting and denoising experiments. This work provides a new way to convexify soft-thresholding neural networks.
Abstract:Data-driven learning algorithm has been successfully applied to facilitate reconstruction of medical imaging. However, real-world data needed for supervised learning are typically unavailable or insufficient, especially in the field of magnetic resonance imaging (MRI). Synthetic training samples have provided a potential solution for such problem, while the challenge brought by various non-ideal situations were usually encountered especially under complex experimental conditions. In this study, a general framework, Model-based Synthetic Data-driven Learning (MOST-DL), was proposed to generate paring data for network training to achieve robust T2 mapping using overlapping-echo acquisition under severe head motion accompanied with inhomogeneous RF field. We decomposed this challenging task into parallel reconstruction and motion correction according to a forward model. The neural network was first trained in pure synthetic dataset and then evaluated with in vivo human brain. Experiments showed that MOST-DL method significantly reduces ghosting and motion artifacts in T2 maps in the presence of random and continuous subject movement. We believe that the proposed approach may open a door for solving similar problems with other MRI acquisition methods and can be extended to other areas of medical imaging.
Abstract:The increasing concerns about data privacy and security drives the emergence of a new field of studying privacy-preserving machine learning from isolated data sources, i.e., \textit{federated learning}. Vertical federated learning, where different parties hold different features for common users, has a great potential of driving a more variety of business cooperation among enterprises in different fields. Decision tree models especially decision tree ensembles are a class of widely applied powerful machine learning models with high interpretability and modeling efficiency. However, the interpretability are compromised in these works such as SecureBoost since the feature names are not exposed to avoid possible data breaches due to the unprotected decision path. In this paper, we shall propose Fed-EINI, an efficient and interpretable inference framework for federated decision tree models with only one round of multi-party communication. We shall compute the candidate sets of leaf nodes based on the local data at each party in parallel, followed by securely computing the weight of the only leaf node in the intersection of the candidate sets. We propose to protect the decision path by the efficient additively homomorphic encryption method, which allows the disclosure of feature names and thus makes the federated decision trees interpretable. The advantages of Fed-EINI will be demonstrated through theoretical analysis and extensive numerical results. Experiments show that the inference efficiency is improved by over $50\%$ in average.
Abstract:Nuclear magnetic resonance (NMR) spectroscopy serves as an indispensable tool in chemistry and biology but often suffers from long experimental time. We present a proof-of-concept of application of deep learning and neural network for high-quality, reliable, and very fast NMR spectra reconstruction from limited experimental data. We show that the neural network training can be achieved using solely synthetic NMR signal, which lifts the prohibiting demand for a large volume of realistic training data usually required in the deep learning approach.
Abstract:Purpose: An end-to-end deep convolutional neural network (CNN) based on deep residual network (ResNet) was proposed to efficiently reconstruct reliable T2 mapping from single-shot OverLapping-Echo Detachment (OLED) planar imaging. Methods: The training dataset was obtained from simulations carried out on SPROM software developed by our group. The relationship between the original OLED image containing two echo signals and the corresponded T2 mapping was learned by ResNet training. After the ResNet was trained, it was applied to reconstruct the T2 mapping from simulation and in vivo human brain data. Results: Though the ResNet was trained entirely on simulated data, the trained network was generalized well to real human brain data. The results from simulation and in vivo human brain experiments show that the proposed method significantly outperformed the echo-detachment-based method. Reliable T2 mapping was achieved within tens of milliseconds after the network had been trained while the echo-detachment-based OLED reconstruction method took minutes. Conclusion: The proposed method will greatly facilitate real-time dynamic and quantitative MR imaging via OLED sequence, and ResNet has the potential to reconstruct images from complex MRI sequence efficiently.