Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Hantao Zhang

CAFusion: Controllable Anatomical Synthesis of Perirectal Lymph Nodes via SDF-guided Diffusion

Mar 10, 2025

Weidong Guo, Hantao Zhang, Shouhong Wan, Bingbing Zou, Wanqin Wang, Chenyang Qiu, Peiquan Jin

Abstract:Lesion synthesis methods have made significant progress in generating large-scale synthetic datasets. However, existing approaches predominantly focus on texture synthesis and often fail to accurately model masks for anatomically complex lesions. Additionally, these methods typically lack precise control over the synthesis process. For example, perirectal lymph nodes, which range in diameter from 1 mm to 10 mm, exhibit irregular and intricate contours that are challenging for current techniques to replicate faithfully. To address these limitations, we introduce CAFusion, a novel approach for synthesizing perirectal lymph nodes. By leveraging Signed Distance Functions (SDF), CAFusion generates highly realistic 3D anatomical structures. Furthermore, it offers flexible control over both anatomical and textural features by decoupling the generation of morphological attributes (such as shape, size, and position) from textural characteristics, including signal intensity. Experimental results demonstrate that our synthetic data substantially improve segmentation performance, achieving a 6.45% increase in the Dice coefficient. In the visual Turing test, experienced radiologists found it challenging to distinguish between synthetic and real lesions, highlighting the high degree of realism and anatomical accuracy achieved by our approach. These findings validate the effectiveness of our method in generating high-quality synthetic lesions for advancing medical image processing applications.

Via

Access Paper or Ask Questions

DiffAtlas: GenAI-fying Atlas Segmentation via Image-Mask Diffusion

Mar 09, 2025

Hantao Zhang, Yuhe Liu, Jiancheng Yang, Weidong Guo, Xinyuan Wang, Pascal Fua

Abstract:Accurate medical image segmentation is crucial for precise anatomical delineation. Deep learning models like U-Net have shown great success but depend heavily on large datasets and struggle with domain shifts, complex structures, and limited training samples. Recent studies have explored diffusion models for segmentation by iteratively refining masks. However, these methods still retain the conventional image-to-mask mapping, making them highly sensitive to input data, which hampers stability and generalization. In contrast, we introduce DiffAtlas, a novel generative framework that models both images and masks through diffusion during training, effectively ``GenAI-fying'' atlas-based segmentation. During testing, the model is guided to generate a specific target image-mask pair, from which the corresponding mask is obtained. DiffAtlas retains the robustness of the atlas paradigm while overcoming its scalability and domain-specific limitations. Extensive experiments on CT and MRI across same-domain, cross-modality, varying-domain, and different data-scale settings using the MMWHS and TotalSegmentator datasets demonstrate that our approach outperforms existing methods, particularly in limited-data and zero-shot modality segmentation. Code is available at https://github.com/M3DV/DiffAtlas.

* 11 pages

Via

Access Paper or Ask Questions

Self-Attention Limits Working Memory Capacity of Transformer-Based Models

Sep 16, 2024

Dongyu Gong, Hantao Zhang

Figure 1 for Self-Attention Limits Working Memory Capacity of Transformer-Based Models

Figure 2 for Self-Attention Limits Working Memory Capacity of Transformer-Based Models

Figure 3 for Self-Attention Limits Working Memory Capacity of Transformer-Based Models

Figure 4 for Self-Attention Limits Working Memory Capacity of Transformer-Based Models

Abstract:Recent work on Transformer-based large language models (LLMs) has revealed striking limits in their working memory capacity, similar to what has been found in human behavioral studies. Specifically, these models' performance drops significantly on N-back tasks as N increases. However, there is still a lack of mechanistic interpretability as to why this phenomenon would arise. Inspired by the executive attention theory from behavioral sciences, we hypothesize that the self-attention mechanism within Transformer-based models might be responsible for their working memory capacity limits. To test this hypothesis, we train vanilla decoder-only transformers to perform N-back tasks and find that attention scores gradually aggregate to the N-back positions over training, suggesting that the model masters the task by learning a strategy to pay attention to the relationship between the current position and the N-back position. Critically, we find that the total entropy of the attention score matrix increases as N increases, suggesting that the dispersion of attention scores might be the cause of the capacity limit observed in N-back tasks.

* 8 pages, 12 figures

Via

Access Paper or Ask Questions

LN-Gen: Rectal Lymph Nodes Generation via Anatomical Features

Aug 27, 2024

Weidong Guo, Hantao Zhang, Shouhong Wan, Bingbing Zou, Wanqin Wang, Peiquan Jin

Figure 1 for LN-Gen: Rectal Lymph Nodes Generation via Anatomical Features

Figure 2 for LN-Gen: Rectal Lymph Nodes Generation via Anatomical Features

Figure 3 for LN-Gen: Rectal Lymph Nodes Generation via Anatomical Features

Figure 4 for LN-Gen: Rectal Lymph Nodes Generation via Anatomical Features

Abstract:Accurate segmentation of rectal lymph nodes is crucial for the staging and treatment planning of rectal cancer. However, the complexity of the surrounding anatomical structures and the scarcity of annotated data pose significant challenges. This study introduces a novel lymph node synthesis technique aimed at generating diverse and realistic synthetic rectal lymph node samples to mitigate the reliance on manual annotation. Unlike direct diffusion methods, which often produce masks that are discontinuous and of suboptimal quality, our approach leverages an implicit SDF-based method for mask generation, ensuring the production of continuous, stable, and morphologically diverse masks. Experimental results demonstrate that our synthetic data significantly improves segmentation performance. Our work highlights the potential of diffusion model for accurately synthesizing structurally complex lesions, such as lymph nodes in rectal cancer, alleviating the challenge of limited annotated data in this field and aiding in advancements in rectal cancer diagnosis and treatment.

* 8 pages

Via

Access Paper or Ask Questions

Voltage-Controlled Magnetoelectric Devices for Neuromorphic Diffusion Process

Jul 17, 2024

Yang Cheng, Qingyuan Shu, Albert Lee, Haoran He, Ivy Zhu, Haris Suhail, Minzhang Chen, Renhe Chen, Zirui Wang, Hantao Zhang(+9 more)

Figure 1 for Voltage-Controlled Magnetoelectric Devices for Neuromorphic Diffusion Process

Figure 2 for Voltage-Controlled Magnetoelectric Devices for Neuromorphic Diffusion Process

Figure 3 for Voltage-Controlled Magnetoelectric Devices for Neuromorphic Diffusion Process

Figure 4 for Voltage-Controlled Magnetoelectric Devices for Neuromorphic Diffusion Process

Abstract:Stochastic diffusion processes are pervasive in nature, from the seemingly erratic Brownian motion to the complex interactions of synaptically-coupled spiking neurons. Recently, drawing inspiration from Langevin dynamics, neuromorphic diffusion models were proposed and have become one of the major breakthroughs in the field of generative artificial intelligence. Unlike discriminative models that have been well developed to tackle classification or regression tasks, diffusion models as well as other generative models such as ChatGPT aim at creating content based upon contexts learned. However, the more complex algorithms of these models result in high computational costs using today's technologies, creating a bottleneck in their efficiency, and impeding further development. Here, we develop a spintronic voltage-controlled magnetoelectric memory hardware for the neuromorphic diffusion process. The in-memory computing capability of our spintronic devices goes beyond current Von Neumann architecture, where memory and computing units are separated. Together with the non-volatility of magnetic memory, we can achieve high-speed and low-cost computing, which is desirable for the increasing scale of generative models in the current era. We experimentally demonstrate that the hardware-based true random diffusion process can be implemented for image generation and achieve comparable image quality to software-based training as measured by the Frechet inception distance (FID) score, achieving ~10^3 better energy-per-bit-per-area over traditional hardware.

Via

Access Paper or Ask Questions

Cross-Modality Jailbreak and Mismatched Attacks on Medical Multimodal Large Language Models

May 26, 2024

Xijie Huang, Xinyuan Wang, Hantao Zhang, Jiawen Xi, Jingkun An, Hao Wang, Chengwei Pan

Figure 1 for Cross-Modality Jailbreak and Mismatched Attacks on Medical Multimodal Large Language Models

Figure 2 for Cross-Modality Jailbreak and Mismatched Attacks on Medical Multimodal Large Language Models

Figure 3 for Cross-Modality Jailbreak and Mismatched Attacks on Medical Multimodal Large Language Models

Figure 4 for Cross-Modality Jailbreak and Mismatched Attacks on Medical Multimodal Large Language Models

Abstract:Security concerns related to Large Language Models (LLMs) have been extensively explored, yet the safety implications for Multimodal Large Language Models (MLLMs), particularly in medical contexts (MedMLLMs), remain insufficiently studied. This paper delves into the underexplored security vulnerabilities of MedMLLMs, especially when deployed in clinical environments where the accuracy and relevance of question-and-answer interactions are critically tested against complex medical challenges. By combining existing clinical medical data with atypical natural phenomena, we redefine two types of attacks: mismatched malicious attack (2M-attack) and optimized mismatched malicious attack (O2M-attack). Using our own constructed voluminous 3MAD dataset, which covers a wide range of medical image modalities and harmful medical scenarios, we conduct a comprehensive analysis and propose the MCM optimization method, which significantly enhances the attack success rate on MedMLLMs. Evaluations with this dataset and novel attack methods, including white-box attacks on LLaVA-Med and transfer attacks on four other state-of-the-art models, indicate that even MedMLLMs designed with enhanced security features are vulnerable to security breaches. Our work underscores the urgent need for a concerted effort to implement robust security measures and enhance the safety and efficacy of open-source MedMLLMs, particularly given the potential severity of jailbreak attacks and other malicious or clinically significant exploits in medical settings. For further research and replication, anonymous access to our code is available at https://github.com/dirtycomputer/O2M_attack. Warning: Medical large model jailbreaking may generate content that includes unverified diagnoses and treatment recommendations. Always consult professional medical advice.

Via

Access Paper or Ask Questions

Meply: A Large-scale Dataset and Baseline Evaluations for Metastatic Perirectal Lymph Node Detection and Segmentation

Apr 13, 2024

Weidong Guo, Hantao Zhang, Shouhong Wan, Bingbing Zou, Wanqin Wang, Chenyang Qiu, Jun Li, Peiquan Jin

Figure 1 for Meply: A Large-scale Dataset and Baseline Evaluations for Metastatic Perirectal Lymph Node Detection and Segmentation

Figure 2 for Meply: A Large-scale Dataset and Baseline Evaluations for Metastatic Perirectal Lymph Node Detection and Segmentation

Figure 3 for Meply: A Large-scale Dataset and Baseline Evaluations for Metastatic Perirectal Lymph Node Detection and Segmentation

Figure 4 for Meply: A Large-scale Dataset and Baseline Evaluations for Metastatic Perirectal Lymph Node Detection and Segmentation

Abstract:Accurate segmentation of metastatic lymph nodes in rectal cancer is crucial for the staging and treatment of rectal cancer. However, existing segmentation approaches face challenges due to the absence of pixel-level annotated datasets tailored for lymph nodes around the rectum. Additionally, metastatic lymph nodes are characterized by their relatively small size, irregular shapes, and lower contrast compared to the background, further complicating the segmentation task. To address these challenges, we present the first large-scale perirectal metastatic lymph node CT image dataset called Meply, which encompasses pixel-level annotations of 269 patients diagnosed with rectal cancer. Furthermore, we introduce a novel lymph-node segmentation model named CoSAM. The CoSAM utilizes sequence-based detection to guide the segmentation of metastatic lymph nodes in rectal cancer, contributing to improved localization performance for the segmentation model. It comprises three key components: sequence-based detection module, segmentation module, and collaborative convergence unit. To evaluate the effectiveness of CoSAM, we systematically compare its performance with several popular segmentation methods using the Meply dataset. Our code and dataset will be publicly available at: https://github.com/kanydao/CoSAM.

* 13 pages

Via

Access Paper or Ask Questions

LeFusion: Synthesizing Myocardial Pathology on Cardiac MRI via Lesion-Focus Diffusion Models

Mar 21, 2024

Hantao Zhang, Jiancheng Yang, Shouhong Wan, Pascal Fua

Abstract:Data generated in clinical practice often exhibits biases, such as long-tail imbalance and algorithmic unfairness. This study aims to mitigate these challenges through data synthesis. Previous efforts in medical imaging synthesis have struggled with separating lesion information from background context, leading to difficulties in generating high-quality backgrounds and limited control over the synthetic output. Inspired by diffusion-based image inpainting, we propose LeFusion, lesion-focused diffusion models. By redesigning the diffusion learning objectives to concentrate on lesion areas, it simplifies the model learning process and enhance the controllability of the synthetic output, while preserving background by integrating forward-diffused background contexts into the reverse diffusion process. Furthermore, we generalize it to jointly handle multi-class lesions, and further introduce a generative model for lesion masks to increase synthesis diversity. Validated on the DE-MRI cardiac lesion segmentation dataset (Emidec), our methodology employs the popular nnUNet to demonstrate that the synthetic data make it possible to effectively enhance a state-of-the-art model. Code and model are available at https://github.com/M3DV/LeFusion.

* 13 pages

Via

Access Paper or Ask Questions

CARE: A Large Scale CT Image Dataset and Clinical Applicable Benchmark Model for Rectal Cancer Segmentation

Aug 16, 2023

Hantao Zhang, Weidong Guo, Chenyang Qiu, Shouhong Wan, Bingbing Zou, Wanqin Wang, Peiquan Jin

Figure 1 for CARE: A Large Scale CT Image Dataset and Clinical Applicable Benchmark Model for Rectal Cancer Segmentation

Figure 2 for CARE: A Large Scale CT Image Dataset and Clinical Applicable Benchmark Model for Rectal Cancer Segmentation

Figure 3 for CARE: A Large Scale CT Image Dataset and Clinical Applicable Benchmark Model for Rectal Cancer Segmentation

Figure 4 for CARE: A Large Scale CT Image Dataset and Clinical Applicable Benchmark Model for Rectal Cancer Segmentation

Abstract:Rectal cancer segmentation of CT image plays a crucial role in timely clinical diagnosis, radiotherapy treatment, and follow-up. Although current segmentation methods have shown promise in delineating cancerous tissues, they still encounter challenges in achieving high segmentation precision. These obstacles arise from the intricate anatomical structures of the rectum and the difficulties in performing differential diagnosis of rectal cancer. Additionally, a major obstacle is the lack of a large-scale, finely annotated CT image dataset for rectal cancer segmentation. To address these issues, this work introduces a novel large scale rectal cancer CT image dataset CARE with pixel-level annotations for both normal and cancerous rectum, which serves as a valuable resource for algorithm research and clinical application development. Moreover, we propose a novel medical cancer lesion segmentation benchmark model named U-SAM. The model is specifically designed to tackle the challenges posed by the intricate anatomical structures of abdominal organs by incorporating prompt information. U-SAM contains three key components: promptable information (e.g., points) to aid in target area localization, a convolution module for capturing low-level lesion details, and skip-connections to preserve and recover spatial information during the encoding-decoding process. To evaluate the effectiveness of U-SAM, we systematically compare its performance with several popular segmentation methods on the CARE dataset. The generalization of the model is further verified on the WORD dataset. Extensive experiments demonstrate that the proposed U-SAM outperforms state-of-the-art methods on these two datasets. These experiments can serve as the baseline for future research and clinical application development.

* 8 pages

Via

Access Paper or Ask Questions