Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Qiang Han

When Mamba Meets xLSTM: An Efficient and Precise Method with the XLSTM-VMUNet Model for Skin lesion Segmentation

Nov 14, 2024

Zhuoyi Fang, KeXuan Shi, Qiang Han

Figure 1 for When Mamba Meets xLSTM: An Efficient and Precise Method with the XLSTM-VMUNet Model for Skin lesion Segmentation

Figure 2 for When Mamba Meets xLSTM: An Efficient and Precise Method with the XLSTM-VMUNet Model for Skin lesion Segmentation

Figure 3 for When Mamba Meets xLSTM: An Efficient and Precise Method with the XLSTM-VMUNet Model for Skin lesion Segmentation

Figure 4 for When Mamba Meets xLSTM: An Efficient and Precise Method with the XLSTM-VMUNet Model for Skin lesion Segmentation

Abstract:Automatic melanoma segmentation is essential for early skin cancer detection, yet challenges arise from the heterogeneity of melanoma, as well as interfering factors like blurred boundaries, low contrast, and imaging artifacts. While numerous algorithms have been developed to address these issues, previous approaches have often overlooked the need to jointly capture spatial and sequential features within dermatological images. This limitation hampers segmentation accuracy, especially in cases with indistinct borders or structurally similar lesions. Additionally, previous models lacked both a global receptive field and high computational efficiency. In this work, we present the XLSTM-VMUNet Model, which jointly capture spatial and sequential features within derma-tological images successfully. XLSTM-VMUNet can not only specialize in extracting spatial features from images, focusing on the structural characteristics of skin lesions, but also enhance contextual understanding, allowing more effective handling of complex medical image structures. Experiment results on the ISIC2018 dataset demonstrate that XLSTM-VMUNet outperforms VMUNet by 1.25% on DSC and 2.07% on IoU, with faster convergence and consistently high segmentation perfor-mance. Our code of XLSTM-VMUNet is available at https://github.com/MrFang/xLSTM-VMUNet.

Via

Access Paper or Ask Questions

Towards Efficient 3D Object Detection in Bird's-Eye-View Space for Autonomous Driving: A Convolutional-Only Approach

Dec 01, 2023

Yuxin Li, Qiang Han, Mengying Yu, Yuxin Jiang, Chaikiat Yeo, Yiheng Li, Zihang Huang, Nini Liu, Hsuanhan Chen, Xiaojun Wu

Abstract:3D object detection in Bird's-Eye-View (BEV) space has recently emerged as a prevalent approach in the field of autonomous driving. Despite the demonstrated improvements in accuracy and velocity estimation compared to perspective view methods, the deployment of BEV-based techniques in real-world autonomous vehicles remains challenging. This is primarily due to their reliance on vision-transformer (ViT) based architectures, which introduce quadratic complexity with respect to the input resolution. To address this issue, we propose an efficient BEV-based 3D detection framework called BEVENet, which leverages a convolutional-only architectural design to circumvent the limitations of ViT models while maintaining the effectiveness of BEV-based methods. Our experiments show that BEVENet is 3$\times$ faster than contemporary state-of-the-art (SOTA) approaches on the NuScenes challenge, achieving a mean average precision (mAP) of 0.456 and a nuScenes detection score (NDS) of 0.555 on the NuScenes validation dataset, with an inference speed of 47.6 frames per second. To the best of our knowledge, this study stands as the first to achieve such significant efficiency improvements for BEV-based methods, highlighting their enhanced feasibility for real-world autonomous driving applications.

Via

Access Paper or Ask Questions

Improvement of a dedicated model for open domain persona-aware dialogue generation

Aug 27, 2020

Qiang Han

Figure 1 for Improvement of a dedicated model for open domain persona-aware dialogue generation

Figure 2 for Improvement of a dedicated model for open domain persona-aware dialogue generation

Abstract:This paper analyzes some speed and performance improvement methods of Transformer architecture in recent years, mainly its application in dedicated model training. The dedicated model studied here refers to the open domain persona-aware dialogue generation model, and the dataset is multi turn short dialogue, The total length of a single input sequence is no more than 105 tokens. Therefore, many improvements in the architecture and attention mechanism of transformer architecture for long sequence processing are not discussed in this paper. The source code of the experiments has been open sourced: https://github.com/ghosthamlet/persona

Via

Access Paper or Ask Questions