Abstract:Nowadays, pre-trained encoders are widely used in medical image segmentation because of their ability to capture complex feature representations. However, the existing models fail to effectively utilize the rich features obtained by the pre-trained encoder, resulting in suboptimal segmentation results. In this work, a novel U-shaped model, called FIF-UNet, is proposed to address the above issue, including three plug-and-play modules. A channel spatial interaction module (CSI) is proposed to obtain informative features by establishing the interaction between encoder stages and corresponding decoder stages. A cascaded conv-SE module (CoSE) is designed to enhance the representation of critical features by adaptively assigning importance weights on different feature channels. A multi-level fusion module (MLF) is proposed to fuse the multi-scale features from the decoder stages, ensuring accurate and robust final segmentation. Comprehensive experiments on the Synapse and ACDC datasets demonstrate that the proposed FIF-UNet outperforms existing state-of-the-art methods, which achieves the highest average DICE of 86.05% and 92.58%, respectively.