Abstract:Automatic melanoma segmentation is essential for early skin cancer detection, yet challenges arise from the heterogeneity of melanoma, as well as interfering factors like blurred boundaries, low contrast, and imaging artifacts. While numerous algorithms have been developed to address these issues, previous approaches have often overlooked the need to jointly capture spatial and sequential features within dermatological images. This limitation hampers segmentation accuracy, especially in cases with indistinct borders or structurally similar lesions. Additionally, previous models lacked both a global receptive field and high computational efficiency. In this work, we present the XLSTM-VMUNet Model, which jointly capture spatial and sequential features within derma-tological images successfully. XLSTM-VMUNet can not only specialize in extracting spatial features from images, focusing on the structural characteristics of skin lesions, but also enhance contextual understanding, allowing more effective handling of complex medical image structures. Experiment results on the ISIC2018 dataset demonstrate that XLSTM-VMUNet outperforms VMUNet by 1.25% on DSC and 2.07% on IoU, with faster convergence and consistently high segmentation perfor-mance. Our code of XLSTM-VMUNet is available at https://github.com/MrFang/xLSTM-VMUNet.