Abstract:Recent advancements in medical imaging have resulted in more complex and diverse images, with challenges such as high anatomical variability, blurred tissue boundaries, low organ contrast, and noise. Traditional segmentation methods struggle to address these challenges, making deep learning approaches, particularly U-shaped architectures, increasingly prominent. However, the quadratic complexity of standard self-attention makes Transformers computationally prohibitive for high-resolution images. To address these challenges, we propose MLLA-UNet (Mamba-Like Linear Attention UNet), a novel architecture that achieves linear computational complexity while maintaining high segmentation accuracy through its innovative combination of linear attention and Mamba-inspired adaptive mechanisms, complemented by an efficient symmetric sampling structure for enhanced feature processing. Our architecture effectively preserves essential spatial features while capturing long-range dependencies at reduced computational complexity. Additionally, we introduce a novel sampling strategy for multi-scale feature fusion. Experiments demonstrate that MLLA-UNet achieves state-of-the-art performance on six challenging datasets with 24 different segmentation tasks, including but not limited to FLARE22, AMOS CT, and ACDC, with an average DSC of 88.32%. These results underscore the superiority of MLLA-UNet over existing methods. Our contributions include the novel 2D segmentation architecture and its empirical validation. The code is available via https://github.com/csyfjiang/MLLA-UNet.
Abstract:In certain brain volumetric studies, synthetic T1-weighted magnetization-prepared rapid gradient-echo (MP-RAGE) contrast, derived from quantitative T1 MRI (T1-qMRI), proves highly valuable due to its clear white/gray matter boundaries for brain segmentation. However, generating synthetic MP-RAGE (syn-MP-RAGE) typically requires pairs of high-quality, artifact-free, multi-modality inputs, which can be challenging in retrospective studies, where missing or corrupted data is common. To overcome this limitation, our research explores the feasibility of employing a deep learning-based approach to synthesize syn-MP-RAGE contrast directly from a single channel turbo spin-echo (TSE) input, renowned for its resistance to metal artifacts. We evaluated this deep learning-based synthetic MP-RAGE (DL-Syn-MPR) on 31 non-artifact and 11 metal-artifact subjects. The segmentation results, measured by the Dice Similarity Coefficient (DSC), consistently achieved high agreement (DSC values above 0.83), indicating a strong correlation with reference segmentations, with lower input requirements. Also, no significant difference in segmentation performance was observed between the artifact and non-artifact groups.
Abstract:Prostate cancer represents a major threat to health. Early detection is vital in reducing the mortality rate among prostate cancer patients. One approach involves using multi-modality (CT, MRI, US, etc.) computer-aided diagnosis (CAD) systems for the prostate region. However, prostate segmentation is challenging due to imperfections in the images and the prostate's complex tissue structure. The advent of precision medicine and a significant increase in clinical capacity have spurred the need for various data-driven tasks in the field of medical imaging. Recently, numerous machine learning and data mining tools have been integrated into various medical areas, including image segmentation. This article proposes a new classification method that differentiates supervision types, either in number or kind, during the training phase. Subsequently, we conducted a survey on artificial intelligence (AI)-based automatic prostate segmentation methods, examining the advantages and limitations of each. Additionally, we introduce variants of evaluation metrics for the verification and performance assessment of the segmentation method and summarize the current challenges. Finally, future research directions and development trends are discussed, reflecting the outcomes of our literature survey, suggesting high-precision detection and treatment of prostate cancer as a promising avenue.
Abstract:Cone-beam computed tomography (CBCT) is routinely collected during image-guided radiation therapy (IGRT) to provide updated patient anatomy information for cancer treatments. However, CBCT images often suffer from streaking artifacts and noise caused by under-rate sampling projections and low-dose exposure, resulting in low clarity and information loss. While recent deep learning-based CBCT enhancement methods have shown promising results in suppressing artifacts, they have limited performance on preserving anatomical details since conventional pixel-to-pixel loss functions are incapable of describing detailed anatomy. To address this issue, we propose a novel feature-oriented deep learning framework that translates low-quality CBCT images into high-quality CT-like imaging via a multi-task customized feature-to-feature perceptual loss function. The framework comprises two main components: a multi-task learning feature-selection network(MTFS-Net) for customizing the perceptual loss function; and a CBCT-to-CT translation network guided by feature-to-feature perceptual loss, which uses advanced generative models such as U-Net, GAN and CycleGAN. Our experiments showed that the proposed framework can generate synthesized CT (sCT) images for the lung that achieved a high similarity to CT images, with an average SSIM index of 0.9869 and an average PSNR index of 39.9621. The sCT images also achieved visually pleasing performance with effective artifacts suppression, noise reduction, and distinctive anatomical details preservation. Our experiment results indicate that the proposed framework outperforms the state-of-the-art models for pulmonary CBCT enhancement. This framework holds great promise for generating high-quality anatomical imaging from CBCT that is suitable for various clinical applications.
Abstract:With their recent development, large language models (LLMs) have been found to exhibit a certain level of Theory of Mind (ToM), a complex cognitive capacity that is related to our conscious mind and that allows us to infer another's beliefs and perspective. While human ToM capabilities are believed to derive from the neural activity of a broadly interconnected brain network, including that of dorsal medial prefrontal cortex (dmPFC) neurons, the precise processes underlying LLM's capacity for ToM or their similarities with that of humans remains largely unknown. In this study, we drew inspiration from the dmPFC neurons subserving human ToM and employed a similar methodology to examine whether LLMs exhibit comparable characteristics. Surprisingly, our analysis revealed a striking resemblance between the two, as hidden embeddings (artificial neurons) within LLMs started to exhibit significant responsiveness to either true- or false-belief trials, suggesting their ability to represent another's perspective. These artificial embedding responses were closely correlated with the LLMs' performance during the ToM tasks, a property that was dependent on the size of the models. Further, the other's beliefs could be accurately decoded using the entire embeddings, indicating the presence of the embeddings' ToM capability at the population level. Together, our findings revealed an emergent property of LLMs' embeddings that modified their activities in response to ToM features, offering initial evidence of a parallel between the artificial model and neurons in the human brain.
Abstract:Four-dimensional magnetic resonance imaging (4D-MRI) is an emerging technique for tumor motion management in image-guided radiation therapy (IGRT). However, current 4D-MRI suffers from low spatial resolution and strong motion artifacts owing to the long acquisition time and patients' respiratory variations; these limitations, if not managed properly, can adversely affect treatment planning and delivery in IGRT. Herein, we developed a novel deep learning framework called the coarse-super-resolution-fine network (CoSF-Net) to achieve simultaneous motion estimation and super-resolution in a unified model. We designed CoSF-Net by fully excavating the inherent properties of 4D-MRI, with consideration of limited and imperfectly matched training datasets. We conducted extensive experiments on multiple real patient datasets to verify the feasibility and robustness of the developed network. Compared with existing networks and three state-of-the-art conventional algorithms, CoSF-Net not only accurately estimated the deformable vector fields between the respiratory phases of 4D-MRI but also simultaneously improved the spatial resolution of 4D-MRI with enhanced anatomic features, yielding 4D-MR images with high spatiotemporal resolution.
Abstract:The inherent characteristics of lung tissues, which are independent of breathing manoeuvre, may provide fundamental information on lung function. This paper attempted to study function-correlated lung textures and their spatial distribution from CT. 21 lung cancer patients with thoracic 4DCT scans, DTPA-SPECT ventilation images (V), and available pulmonary function test (PFT) measurements were collected. 79 radiomic features were included for analysis, and a sparse-to-fine strategy including subregional feature discovery and voxel-wise feature distribution study was carried out to identify the function-correlated radiomic features. At the subregion level, lung CT images were partitioned and labeled as defected/non-defected patches according to reference V. At the voxel-wise level, feature maps (FMs) of selected feature candidates were generated for each 4DCT phase. Quantitative metrics, including Spearman coefficient of correlation (SCC) and Dice similarity coefficient (DSC) for FM-V spatial agreement assessments, intra-class coefficient of correlation (ICC) for FM robustness evaluations, and FM-PFT comparisons, were applied to validate the results. At the subregion level, eight function-correlated features were filtered out with medium-to-large statistical strength (effect size>0.330) to differentiate defected/non-defected lung regions. At the voxel-wise level, FMs of candidates yielded moderate-to-strong voxel-wise correlations with reference V. Among them, FMs of GLDM Dependence Non-uniformity showed the highest robust (ICC=0.96) spatial correlation, with median SCCs ranging from 0.54 to 0.59 throughout ten phases. Its phase-averaged FM achieved a median SCC of 0.60, the median DSC of 0.60/0.65 for high/low functional lung volumes, respectively, and the correlation of 0.646 between the spatially averaged feature values and PFT measurements.
Abstract:The prediction of adaptive radiation therapy (ART) prior to radiation therapy (RT) for nasopharyngeal carcinoma (NPC) patients is important to reduce toxicity and prolong the survival of patients. Currently, due to the complex tumor micro-environment, a single type of high-resolution image can provide only limited information. Meanwhile, the traditional softmax-based loss is insufficient for quantifying the discriminative power of a model. To overcome these challenges, we propose a supervised multi-view contrastive learning method with an additive margin (MMCon). For each patient, four medical images are considered to form multi-view positive pairs, which can provide additional information and enhance the representation of medical images. In addition, the embedding space is learned by means of contrastive learning. NPC samples from the same patient or with similar labels will remain close in the embedding space, while NPC samples with different labels will be far apart. To improve the discriminative ability of the loss function, we incorporate a margin into the contrastive learning. Experimental result show this new learning objective can be used to find an embedding space that exhibits superior discrimination ability for NPC images.
Abstract:Given a user's historical interaction sequence, online novel recommendation suggests the next novel the user may be interested in. Online novel recommendation is important but underexplored. In this paper, we concentrate on recommending online novels to new users of an online novel reading platform, whose first visits to the platform occurred in the last seven days. We have two observations about online novel recommendation for new users. First, repeat novel consumption of new users is a common phenomenon. Second, interactions between users and novels are informative. To accurately predict whether a user will reconsume a novel, it is crucial to characterize each interaction at a fine-grained level. Based on these two observations, we propose a neural network for online novel recommendation, called NovelNet. NovelNet can recommend the next novel from both the user's consumed novels and new novels simultaneously. Specifically, an interaction encoder is used to obtain accurate interaction representation considering fine-grained attributes of interaction, and a pointer network with a pointwise loss is incorporated into NovelNet to recommend previously-consumed novels. Moreover, an online novel recommendation dataset is built from a well-known online novel reading platform and is released for public use as a benchmark. Experimental results on the dataset demonstrate the effectiveness of NovelNet.
Abstract:Cervical glandular cell (GC) detection is a key step in computer-aided diagnosis for cervical adenocarcinomas screening. It is challenging to accurately recognize GCs in cervical smears in which squamous cells are the major. Widely existing Out-Of-Distribution (OOD) data in the entire smear leads decreasing reliability of machine learning system for GC detection. Although, the State-Of-The-Art (SOTA) deep learning model can outperform pathologists in preselected regions of interest, the mass False Positive (FP) prediction with high probability is still unsolved when facing such gigapixel whole slide image. This paper proposed a novel PolarNet based on the morphological prior knowledge of GC trying to solve the FP problem via a self-attention mechanism in eight-neighbor. It estimates the polar orientation of nucleus of GC. As a plugin module, PolarNet can guide the deep feature and predicted confidence of general object detection models. In experiments, we discovered that general models based on four different frameworks can reject FP in small image set and increase the mean of average precision (mAP) by $\text{0.007}\sim\text{0.015}$ in average, where the highest exceeds the recent cervical cell detection model 0.037. By plugging PolarNet, the deployed C++ program improved by 8.8\% on accuracy of top-20 GC detection from external WSIs, while sacrificing 14.4 s of computational time. Code is available in https://github.com/Chrisa142857/PolarNet-GCdet