Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Xin You

Shape-aware Sampling Matters in the Modeling of Multi-Class Tubular Structures

Jun 14, 2025

Minghui Zhang, Yaoyu Liu, Xin You, Hanxiao Zhang, Yun Gu

Abstract:Accurate multi-class tubular modeling is critical for precise lesion localization and optimal treatment planning. Deep learning methods enable automated shape modeling by prioritizing volumetric overlap accuracy. However, the inherent complexity of fine-grained semantic tubular shapes is not fully emphasized by overlap accuracy, resulting in reduced topological preservation. To address this, we propose the Shapeaware Sampling (SAS), which optimizes patchsize allocation for online sampling and extracts a topology-preserved skeletal representation for the objective function. Fractal Dimension-based Patchsize (FDPS) is first introduced to quantify semantic tubular shape complexity through axis-specific fractal dimension analysis. Axes with higher fractal complexity are then sampled with smaller patchsizes to capture fine-grained features and resolve structural intricacies. In addition, Minimum Path-Cost Skeletonization (MPC-Skel) is employed to sample topologically consistent skeletal representations of semantic tubular shapes for skeleton-weighted objective functions. MPC-Skel reduces artifacts from conventional skeletonization methods and directs the focus to critical topological regions, enhancing tubular topology preservation. SAS is computationally efficient and easily integrable into optimization pipelines. Evaluation on two semantic tubular datasets showed consistent improvements in both volumetric overlap and topological integrity metrics.

Via

Access Paper or Ask Questions

Temporal Differential Fields for 4D Motion Modeling via Image-to-Video Synthesis

May 22, 2025

Xin You, Minghui Zhang, Hanxiao Zhang, Jie Yang, Nassir Navab

Abstract:Temporal modeling on regular respiration-induced motions is crucial to image-guided clinical applications. Existing methods cannot simulate temporal motions unless high-dose imaging scans including starting and ending frames exist simultaneously. However, in the preoperative data acquisition stage, the slight movement of patients may result in dynamic backgrounds between the first and last frames in a respiratory period. This additional deviation can hardly be removed by image registration, thus affecting the temporal modeling. To address that limitation, we pioneeringly simulate the regular motion process via the image-to-video (I2V) synthesis framework, which animates with the first frame to forecast future frames of a given length. Besides, to promote the temporal consistency of animated videos, we devise the Temporal Differential Diffusion Model to generate temporal differential fields, which measure the relative differential representations between adjacent frames. The prompt attention layer is devised for fine-grained differential fields, and the field augmented layer is adopted to better interact these fields with the I2V framework, promoting more accurate temporal variation of synthesized videos. Extensive results on ACDC cardiac and 4D Lung datasets reveal that our approach simulates 4D videos along the intrinsic motion trajectory, rivaling other competitive methods on perceptual similarity and temporal consistency. Codes will be available soon.

* early accepted by MICCAI

Via

Access Paper or Ask Questions

Multi-Class Segmentation of Aortic Branches and Zones in Computed Tomography Angiography: The AortaSeg24 Challenge

Feb 07, 2025

Muhammad Imran, Jonathan R. Krebs, Vishal Balaji Sivaraman, Teng Zhang, Amarjeet Kumar, Walker R. Ueland, Michael J. Fassler, Jinlong Huang, Xiao Sun, Lisheng Wang(+53 more)

Figure 1 for Multi-Class Segmentation of Aortic Branches and Zones in Computed Tomography Angiography: The AortaSeg24 Challenge

Figure 2 for Multi-Class Segmentation of Aortic Branches and Zones in Computed Tomography Angiography: The AortaSeg24 Challenge

Figure 3 for Multi-Class Segmentation of Aortic Branches and Zones in Computed Tomography Angiography: The AortaSeg24 Challenge

Figure 4 for Multi-Class Segmentation of Aortic Branches and Zones in Computed Tomography Angiography: The AortaSeg24 Challenge

Abstract:Multi-class segmentation of the aorta in computed tomography angiography (CTA) scans is essential for diagnosing and planning complex endovascular treatments for patients with aortic dissections. However, existing methods reduce aortic segmentation to a binary problem, limiting their ability to measure diameters across different branches and zones. Furthermore, no open-source dataset is currently available to support the development of multi-class aortic segmentation methods. To address this gap, we organized the AortaSeg24 MICCAI Challenge, introducing the first dataset of 100 CTA volumes annotated for 23 clinically relevant aortic branches and zones. This dataset was designed to facilitate both model development and validation. The challenge attracted 121 teams worldwide, with participants leveraging state-of-the-art frameworks such as nnU-Net and exploring novel techniques, including cascaded models, data augmentation strategies, and custom loss functions. We evaluated the submitted algorithms using the Dice Similarity Coefficient (DSC) and Normalized Surface Distance (NSD), highlighting the approaches adopted by the top five performing teams. This paper presents the challenge design, dataset details, evaluation metrics, and an in-depth analysis of the top-performing algorithms. The annotated dataset, evaluation code, and implementations of the leading methods are publicly available to support further research. All resources can be accessed at https://aortaseg24.grand-challenge.org.

Via

Access Paper or Ask Questions

Topology-Aware Exploration of Circle of Willis for CTA and MRA: Segmentation, Detection, and Classification

Oct 21, 2024

Minghui Zhang, Xin You, Hanxiao Zhang, Yun Gu

Figure 1 for Topology-Aware Exploration of Circle of Willis for CTA and MRA: Segmentation, Detection, and Classification

Figure 2 for Topology-Aware Exploration of Circle of Willis for CTA and MRA: Segmentation, Detection, and Classification

Figure 3 for Topology-Aware Exploration of Circle of Willis for CTA and MRA: Segmentation, Detection, and Classification

Figure 4 for Topology-Aware Exploration of Circle of Willis for CTA and MRA: Segmentation, Detection, and Classification

Abstract:The Circle of Willis (CoW) vessels is critical to connecting major circulations of the brain. The topology of the vascular structure is clinical significance to evaluate the risk, severity of the neuro-vascular diseases. The CoW has two representative angiographic imaging modalities, computed tomography angiography (CTA) and magnetic resonance angiography (MRA). TopCow24 provided 125 paired CTA-MRA dataset for the analysis of CoW. To explore both CTA and MRA images in a unified framework to learn the inherent topology of Cow, we construct the universal dataset via independent intensity preprocess, followed by joint resampling and normarlization. Then, we utilize the topology-aware loss to enhance the topology completeness of the CoW and the discrimination between different classes. A complementary topology-aware refinement is further conducted to enhance the connectivity within the same class. Our method was evaluated on all the three tasks and two modalities, achieving competitive results. In the final test phase of TopCow24 Challenge, we achieved the second place in the CTA-Seg-Task, the third palce in the CTA-Box-Task, the first place in the CTA-Edg-Task, the second place in the MRA-Seg-Task, the third palce in the MRA-Box-Task, the second place in the MRA-Edg-Task.

* Participation technical report for TopCoW24 challenge @ MICCAI 2024

Via

Access Paper or Ask Questions

PASS:Test-Time Prompting to Adapt Styles and Semantic Shapes in Medical Image Segmentation

Oct 02, 2024

Chuyan Zhang, Hao Zheng, Xin You, Yefeng Zheng, Yun Gu

Figure 1 for PASS:Test-Time Prompting to Adapt Styles and Semantic Shapes in Medical Image Segmentation

Figure 2 for PASS:Test-Time Prompting to Adapt Styles and Semantic Shapes in Medical Image Segmentation

Figure 3 for PASS:Test-Time Prompting to Adapt Styles and Semantic Shapes in Medical Image Segmentation

Figure 4 for PASS:Test-Time Prompting to Adapt Styles and Semantic Shapes in Medical Image Segmentation

Abstract:Test-time adaptation (TTA) has emerged as a promising paradigm to handle the domain shifts at test time for medical images from different institutions without using extra training data. However, existing TTA solutions for segmentation tasks suffer from (1) dependency on modifying the source training stage and access to source priors or (2) lack of emphasis on shape-related semantic knowledge that is crucial for segmentation tasks.Recent research on visual prompt learning achieves source-relaxed adaptation by extended parameter space but still neglects the full utilization of semantic features, thus motivating our work on knowledge-enriched deep prompt learning. Beyond the general concern of image style shifts, we reveal that shape variability is another crucial factor causing the performance drop. To address this issue, we propose a TTA framework called PASS (Prompting to Adapt Styles and Semantic shapes), which jointly learns two types of prompts: the input-space prompt to reformulate the style of the test image to fit into the pretrained model and the semantic-aware prompts to bridge high-level shape discrepancy across domains. Instead of naively imposing a fixed prompt, we introduce an input decorator to generate the self-regulating visual prompt conditioned on the input data. To retrieve the knowledge representations and customize target-specific shape prompts for each test sample, we propose a cross-attention prompt modulator, which performs interaction between target representations and an enriched shape prompt bank. Extensive experiments demonstrate the superior performance of PASS over state-of-the-art methods on multiple medical image segmentation datasets. The code is available at https://github.com/EndoluminalSurgicalVision-IMR/PASS.

* Submitted to IEEE TMI

Via

Access Paper or Ask Questions

SLoRD: Structural Low-Rank Descriptors for Shape Consistency in Vertebrae Segmentation

Jul 11, 2024

Xin You, Yixin Lou, Minghui Zhang, Chuyan Zhang, Jie Yang, Yun Gu

Figure 1 for SLoRD: Structural Low-Rank Descriptors for Shape Consistency in Vertebrae Segmentation

Figure 2 for SLoRD: Structural Low-Rank Descriptors for Shape Consistency in Vertebrae Segmentation

Figure 3 for SLoRD: Structural Low-Rank Descriptors for Shape Consistency in Vertebrae Segmentation

Figure 4 for SLoRD: Structural Low-Rank Descriptors for Shape Consistency in Vertebrae Segmentation

Abstract:Automatic and precise segmentation of vertebrae from CT images is crucial for various clinical applications. However, due to a lack of explicit and strict constraints, existing methods especially for single-stage methods, still suffer from the challenge of intra-vertebrae segmentation inconsistency, which refers to multiple label predictions inside a singular vertebra. For multi-stage methods, vertebrae detection serving as the first step, is affected by the pathology and mental implants. Thus, incorrect detections cause biased patches before segmentation, then lead to inconsistent labeling and segmentation. In our work, motivated by the perspective of instance segmentation, we try to label individual and complete binary masks to address this limitation. Specifically, a contour-based network is proposed based on Structural Low-Rank Descriptors for shape consistency, termed SLoRD. These contour descriptors are acquired in a data-driven manner in advance. For a more precise representation of contour descriptors, we adopt the spherical coordinate system and devise the spherical centroid. Besides, the contour loss is designed to impose explicit consistency constraints, facilitating regressed contour points close to vertebral boundaries. Quantitative and qualitative evaluations on VerSe 2019 demonstrate the superior performance of our framework over other single-stage and multi-stage state-of-the-art (SOTA) methods.

* Under review

Via

Access Paper or Ask Questions

PnPNet: Pull-and-Push Networks for Volumetric Segmentation with Boundary Confusion

Dec 13, 2023

Xin You, Ming Ding, Minghui Zhang, Hanxiao Zhang, Yi Yu, Jie Yang, Yun Gu

Figure 1 for PnPNet: Pull-and-Push Networks for Volumetric Segmentation with Boundary Confusion

Figure 2 for PnPNet: Pull-and-Push Networks for Volumetric Segmentation with Boundary Confusion

Figure 3 for PnPNet: Pull-and-Push Networks for Volumetric Segmentation with Boundary Confusion

Figure 4 for PnPNet: Pull-and-Push Networks for Volumetric Segmentation with Boundary Confusion

Abstract:Precise boundary segmentation of volumetric images is a critical task for image-guided diagnosis and computer-assisted intervention, especially for boundary confusion in clinical practice. However, U-shape networks cannot effectively resolve this challenge due to the lack of boundary shape constraints. Besides, existing methods of refining boundaries overemphasize the slender structure, which results in the overfitting phenomenon due to networks' limited abilities to model tiny objects. In this paper, we reconceptualize the mechanism of boundary generation by encompassing the interaction dynamics with adjacent regions. Moreover, we propose a unified network termed PnPNet to model shape characteristics of the confused boundary region. Core ingredients of PnPNet contain the pushing and pulling branches. Specifically, based on diffusion theory, we devise the semantic difference module (SDM) from the pushing branch to squeeze the boundary region. Explicit and implicit differential information inside SDM significantly boost representation abilities for inter-class boundaries. Additionally, motivated by the K-means algorithm, the class clustering module (CCM) from the pulling branch is introduced to stretch the intersected boundary region. Thus, pushing and pulling branches will shrink and enlarge the boundary uncertainty respectively. They furnish two adversarial forces to promote models to output a more precise delineation of boundaries. We carry out experiments on three challenging public datasets and one in-house dataset, containing three types of boundary confusion in model predictions. Experimental results demonstrate the superiority of PnPNet over other segmentation networks, especially on evaluation metrics of HD and ASSD. Besides, pushing and pulling branches can serve as plug-and-play modules to enhance classic U-shape baseline models. Codes are available.

* 13 Figures, 8 Tables

Via

Access Paper or Ask Questions

Implicit Shape Modeling for Anatomical Structure Refinement of Volumetric Medical Images

Dec 11, 2023

Minghui Zhang, Hanxiao Zhang, Xin You, Yun Gu

Abstract:Shape modeling of volumetric medical images is a critical task for quantitative analysis and surgical plans in computer-aided diagnosis. To relieve the burden of expert clinicians, the reconstructed shapes are widely acquired from deep learning models, e.g. Convolutional Neural Networks (CNNs), followed by marching cube algorithm. However, automatically obtaining reconstructed shapes can not always achieve perfect results due to the limited resolution of images and lack of shape prior constraints. In this paper, we design a unified framework for the refinement of medical image segmentation on top of an implicit neural network. Specifically, To learn a sharable shape prior from different instances within the same category in the training phase, the physical information of volumetric medical images are firstly utilized to construct the Physical-Informed Continuous Coordinate Transform (PICCT). PICCT transforms the input data in an aligned manner fed into the implicit shape modeling. To better learn shape representation, we introduce implicit shape constraints on top of the signed distance function (SDF) into the implicit shape modeling of both instances and latent template. For the inference phase, a template interaction module (TIM) is proposed to refine initial results produced by CNNs via deforming deep implicit templates with latent codes. Experimental results on three datasets demonstrated the superiority of our approach in shape refinement. The Chamfer Distance/Earth Mover's Distance achieved by the proposed method are 0.232/0.087 on the Liver dataset, 0.128/0.069 on the Pancreas dataset, and 0.417/0.100 on the Lung Lobe dataset.

Via

Access Paper or Ask Questions

Learning with Explicit Shape Priors for Medical Image Segmentation

Mar 31, 2023

Xin You, Junjun He, Jie Yang, Yun Gu

Abstract:Medical image segmentation is considered as the basic step for medical image analysis and surgical intervention. And many previous works attempted to incorporate shape priors for designing segmentation models, which is beneficial to attain finer masks with anatomical shape information. Here in our work, we detailedly discuss three types of segmentation models with shape priors, which consist of atlas-based models, statistical-based models and UNet-based models. On the ground that the former two kinds of methods show a poor generalization ability, UNet-based models have dominated the field of medical image segmentation in recent years. However, existing UNet-based models tend to employ implicit shape priors, which do not have a good interpretability and generalization ability on different organs with distinctive shapes. Thus, we proposed a novel shape prior module (SPM), which could explicitly introduce shape priors to promote the segmentation performance of UNet-based models. To evaluate the effectiveness of SPM, we conduct experiments on three challenging public datasets. And our proposed model achieves state-of-the-art performance. Furthermore, SPM shows an outstanding generalization ability on different classic convolution-neural-networks (CNNs) and recent Transformer-based backbones, which can serve as a plug-and-play structure for the segmentation task of different datasets.

* 23 pages, 11 figures

Via

Access Paper or Ask Questions

The Deep Learning Compiler: A Comprehensive Survey

Feb 27, 2020

Mingzhen Li, Yi Liu, Xiaoyan Liu, Qingxiao Sun, Xin You, Hailong Yang, Zhongzhi Luan, Depei Qian

Figure 1 for The Deep Learning Compiler: A Comprehensive Survey

Figure 2 for The Deep Learning Compiler: A Comprehensive Survey

Figure 3 for The Deep Learning Compiler: A Comprehensive Survey

Figure 4 for The Deep Learning Compiler: A Comprehensive Survey

Abstract:The difficulty of deploying various deep learning (DL) models on diverse DL hardware has boosted the research and development of DL compilers in the community. Several DL compilers have been proposed from both industry and academia such as Tensorflow XLA and TVM. Similarly, the DL compilers take the DL models described in different DL frameworks as input, and then generate optimized codes for diverse DL hardware as output. However, none of the existing survey has analyzed the unique design of the DL compilers comprehensively. In this paper, we perform a comprehensive survey of existing DL compilers by dissecting the commonly adopted design in details, with emphasis on the DL oriented multi-level IRs, and frontend/backend optimizations. Specifically, we provide a comprehensive comparison among existing DL compilers from various aspects. In addition, we present detailed analysis of the multi-level IR design and compiler optimization techniques. Finally, several insights are highlighted as the potential research directions of DL compiler. This is the first survey paper focusing on the unique design of DL compiler, which we hope can pave the road for future research towards the DL compiler.

Via

Access Paper or Ask Questions