Abstract:The convolutional-based methods provide good segmentation performance in the medical image segmentation task. However, those methods have the following challenges when dealing with the edges of the medical images: (1) Previous convolutional-based methods do not focus on the boundary relationship between foreground and background around the segmentation edge, which leads to the degradation of segmentation performance when the edge changes complexly. (2) The inductive bias of the convolutional layer cannot be adapted to complex edge changes and the aggregation of multiple-segmented areas, resulting in its performance improvement mostly limited to segmenting the body of segmented areas instead of the edge. To address these challenges, we propose the CM-MLP framework on MFI (Multi-scale Feature Interaction) block and ACRE (Axial Context Relation Encoder) block for accurate segmentation of the edge of medical image. In the MFI block, we propose the cascade multi-scale MLP (Cascade MLP) to process all local information from the deeper layers of the network simultaneously and utilize a cascade multi-scale mechanism to fuse discrete local information gradually. Then, the ACRE block is used to make the deep supervision focus on exploring the boundary relationship between foreground and background to modify the edge of the medical image. The segmentation accuracy (Dice) of our proposed CM-MLP framework reaches 96.96%, 96.76%, and 82.54% on three benchmark datasets: CVC-ClinicDB dataset, sub-Kvasir dataset, and our in-house dataset, respectively, which significantly outperform the state-of-the-art method. The source code and trained models will be available at https://github.com/ProgrammerHyy/CM-MLP.
Abstract:Many recently developed object detectors focused on coarse-to-fine framework which contains several stages that classify and regress proposals from coarse-grain to fine-grain, and obtains more accurate detection gradually. Multi-resolution models such as Feature Pyramid Network(FPN) integrate information of different levels of resolution and effectively improve the performance. Previous researches also have revealed that localization can be further improved by: 1) using fine-grained information which is more translational variant; 2) refining local areas which is more focused on local boundary information. Based on these principles, we designed a novel boundary refinement architecture to improve localization accuracy by combining coarse-to-fine framework with feature pyramid structure, named as Pyramidal Bounding Box Refinement network(PBRnet), which parameterizes gradually focused boundary areas of objects and leverages lower-level feature maps to extract finer local information when refining the predicted bounding boxes. Extensive experiments are performed on the MS-COCO dataset. The PBRnet brings a significant performance gains by roughly 3 point of $mAP$ when added to FPN or Libra R-CNN. Moreover, by treating Cascade R-CNN as a coarse-to-fine detector and replacing its localization branch by the regressor of PBRnet, it leads an extra performance improvement by 1.5 $mAP$, yielding a total performance boosting by as high as 5 point of $mAP$.