Abstract:Modern Earth Observation systems provide sensing data at different temporal and spatial resolutions. Among optical sensors, today the Sentinel-2 program supplies high-resolution temporal (every 5 days) and high spatial resolution (10m) images that can be useful to monitor land cover dynamics. On the other hand, Very High Spatial Resolution images (VHSR) are still an essential tool to figure out land cover mapping characterized by fine spatial patterns. Understand how to efficiently leverage these complementary sources of information together to deal with land cover mapping is still challenging. With the aim to tackle land cover mapping through the fusion of multi-temporal High Spatial Resolution and Very High Spatial Resolution satellite images, we propose an End-to-End Deep Learning framework, named M3Fusion, able to leverage simultaneously the temporal knowledge contained in time series data as well as the fine spatial information available in VHSR information. Experiments carried out on the Reunion Island study area asses the quality of our proposal considering both quantitative and qualitative aspects.