Abstract:Convolutional neural networks have primarily led 3D medical image segmentation but may be limited by small receptive fields. Transformer models excel in capturing global relationships through self-attention but are challenged by high computational costs at high resolutions. Recently, Mamba, a state space model, has emerged as an effective approach for sequential modeling. Inspired by its success, we introduce a novel Mamba-based 3D medical image segmentation model called EM-Net. It not only efficiently captures attentive interaction between regions by integrating and selecting channels, but also effectively utilizes frequency domain to harmonize the learning of features across varying scales, while accelerating training speed. Comprehensive experiments on two challenging multi-organ datasets with other state-of-the-art (SOTA) algorithms show that our method exhibits better segmentation accuracy while requiring nearly half the parameter size of SOTA models and 2x faster training speed.
Abstract:Objective: Ultrasound (US) examination has unique advantages in diagnosing carpal tunnel syndrome (CTS) while identifying the median nerve (MN) and diagnosing CTS depends heavily on the expertise of examiners. To alleviate this problem, we aimed to develop a one-stop automated CTS diagnosis system (OSA-CTSD) and evaluate its effectiveness as a computer-aided diagnostic tool. Methods: We combined real-time MN delineation, accurate biometric measurements, and explainable CTS diagnosis into a unified framework, called OSA-CTSD. We collected a total of 32,301 static images from US videos of 90 normal wrists and 40 CTS wrists for evaluation using a simplified scanning protocol. Results: The proposed model showed better segmentation and measurement performance than competing methods, reporting that HD95 score of 7.21px, ASSD score of 2.64px, Dice score of 85.78%, and IoU score of 76.00%, respectively. In the reader study, it demonstrated comparable performance with the average performance of the experienced in classifying the CTS, while outperformed that of the inexperienced radiologists in terms of classification metrics (e.g., accuracy score of 3.59% higher and F1 score of 5.85% higher). Conclusion: The OSA-CTSD demonstrated promising diagnostic performance with the advantages of real-time, automation, and clinical interpretability. The application of such a tool can not only reduce reliance on the expertise of examiners, but also can help to promote the future standardization of the CTS diagnosis process, benefiting both patients and radiologists.
Abstract:Breast lesion segmentation from breast ultrasound (BUS) videos could assist in early diagnosis and treatment. Existing video object segmentation (VOS) methods usually require dense annotation, which is often inaccessible for medical datasets. Furthermore, they suffer from accumulative errors and a lack of explicit space-time awareness. In this work, we propose a novel two-shot training paradigm for BUS video segmentation. It not only is able to capture free-range space-time consistency but also utilizes a source-dependent augmentation scheme. This label-efficient learning framework is validated on a challenging in-house BUS video dataset. Results showed that it gained comparable performance to the fully annotated ones given only 1.9% training labels.
Abstract:Interactive medical image segmentation refers to the accurate segmentation of the target of interest through interaction (e.g., click) between the user and the image. It has been widely studied in recent years as it is less dependent on abundant annotated data and more flexible than fully automated segmentation. However, current studies have not fully explored user-provided prompt information (e.g., points), including the knowledge mined in one interaction, and the relationship between multiple interactions. Thus, in this paper, we introduce a novel framework equipped with prompt enhancement, called PE-MED, for interactive medical image segmentation. First, we introduce a Self-Loop strategy to generate warm initial segmentation results based on the first prompt. It can prevent the highly unfavorable scenarios, such as encountering a blank mask as the initial input after the first interaction. Second, we propose a novel Prompt Attention Learning Module (PALM) to mine useful prompt information in one interaction, enhancing the responsiveness of the network to user clicks. Last, we build a Time Series Information Propagation (TSIP) mechanism to extract the temporal relationships between multiple interactions and increase the model stability. Comparative experiments with other state-of-the-art (SOTA) medical image segmentation algorithms show that our method exhibits better segmentation accuracy and stability.