LandCover.ai (Land Cover from Aerial Imagery) dataset for aerial images, alongside the corresponding dataset sourced from Sentinel 2 data. Through the fusion of both sources, the mean Intersection over Union (mIoU) achieved a total mIoU of 84.91% without data augmentation.
Aerial and satellite imagery are inherently complementary remote sensing sources, offering high-resolution detail alongside expansive spatial coverage. However, the use of these sources for land cover segmentation introduces several challenges, prompting the development of a variety of segmentation methods. Among these approaches, the DeepLabV3+ architecture is considered as a promising approach in the field of single-source image segmentation. However, despite its reliable results for segmentation, there is still a need to increase its robustness and improve its performance. This is particularly crucial for multimodal image segmentation, where the fusion of diverse types of information is essential. An interesting approach involves enhancing this architectural framework through the integration of novel components and the modification of certain internal processes. In this paper, we enhance the DeepLabV3+ architecture by introducing a new transposed conventional layers block for upsampling a second entry to fuse it with high level features. This block is designed to amplify and integrate information from satellite images, thereby enriching the segmentation process through fusion with aerial images. For experiments, we used the