Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Bingkun Nian

MambaMIM: Pre-training Mamba with State Space Token-interpolation

Aug 15, 2024

Fenghe Tang, Bingkun Nian, Yingtai Li, Jie Yang, Liu Wei, S. Kevin Zhou

Abstract:Generative self-supervised learning demonstrates outstanding representation learning capabilities in both Convolutional Neural Networks (CNNs) and Vision Transformers (ViTs). However, there are currently no generative pre-training methods related to selective state space models (Mamba) that can handle long-range dependencies effectively. To address this challenge, we introduce a generative self-supervised learning method for Mamba (MambaMIM) based on Selective Structure State Space Sequence Token-interpolation (S6T), a general-purpose pre-training method for arbitrary Mamba architectures. Our method, MambaMIM, incorporates a bottom-up 3D hybrid masking strategy in the encoder to maintain masking consistency across different architectures. Additionally, S6T is employed to learn causal relationships between the masked sequence in the state space. MambaMIM can be used on any single or hybrid Mamba architectures to enhance the Mamba long-range representation capability. Extensive downstream experiments reveal the feasibility and advancement of using Mamba for pre-training medical image tasks. The code is available at: https://github.com/FengheTan9/MambaMIM

* 10 pages, 7 figures

Via

Access Paper or Ask Questions

SRSNetwork: Siamese Reconstruction-Segmentation Networks based on Dynamic-Parameter Convolution

Dec 04, 2023

Bingkun Nian, Fenghe Tang, Jianrui Ding, Pingping Zhang, Jie Yang, S. Kevin Zhou, Wei Liu

Figure 1 for SRSNetwork: Siamese Reconstruction-Segmentation Networks based on Dynamic-Parameter Convolution

Figure 2 for SRSNetwork: Siamese Reconstruction-Segmentation Networks based on Dynamic-Parameter Convolution

Figure 3 for SRSNetwork: Siamese Reconstruction-Segmentation Networks based on Dynamic-Parameter Convolution

Figure 4 for SRSNetwork: Siamese Reconstruction-Segmentation Networks based on Dynamic-Parameter Convolution

Abstract:In this paper, we present a high-performance deep neural network for weak target image segmentation, including medical image segmentation and infrared image segmentation. To this end, this work analyzes the existing dynamic convolutions and proposes dynamic parameter convolution (DPConv). Furthermore, it reevaluates the relationship between reconstruction tasks and segmentation tasks from the perspective of DPConv, leading to the proposal of a dual-network model called the Siamese Reconstruction-Segmentation Network (SRSNet). The proposed model is not only a universal network but also enhances the segmentation performance without altering its structure, leveraging the reconstruction task. Additionally, as the amount of training data for the reconstruction network increases, the performance of the segmentation network also improves synchronously. On seven datasets including five medical datasets and two infrared image datasets, our SRSNet consistently achieves the best segmentation results. The code is released at https://github.com/fidshu/SRSNet.

* 14 pages

Via

Access Paper or Ask Questions

MobileUtr: Revisiting the relationship between light-weight CNN and Transformer for efficient medical image segmentation

Dec 04, 2023

Fenghe Tang, Bingkun Nian, Jianrui Ding, Quan Quan, Jie Yang, Wei Liu, S. Kevin Zhou

Abstract:Due to the scarcity and specific imaging characteristics in medical images, light-weighting Vision Transformers (ViTs) for efficient medical image segmentation is a significant challenge, and current studies have not yet paid attention to this issue. This work revisits the relationship between CNNs and Transformers in lightweight universal networks for medical image segmentation, aiming to integrate the advantages of both worlds at the infrastructure design level. In order to leverage the inductive bias inherent in CNNs, we abstract a Transformer-like lightweight CNNs block (ConvUtr) as the patch embeddings of ViTs, feeding Transformer with denoised, non-redundant and highly condensed semantic information. Moreover, an adaptive Local-Global-Local (LGL) block is introduced to facilitate efficient local-to-global information flow exchange, maximizing Transformer's global context information extraction capabilities. Finally, we build an efficient medical image segmentation model (MobileUtr) based on CNN and Transformer. Extensive experiments on five public medical image datasets with three different modalities demonstrate the superiority of MobileUtr over the state-of-the-art methods, while boasting lighter weights and lower computational cost. Code is available at https://github.com/FengheTan9/MobileUtr.

* 13 pages

Via

Access Paper or Ask Questions