Abstract:Object detection in remote sensing is a crucial computer vision task that has seen significant advancements with deep learning techniques. However, most existing works in this area focus on the use of generic object detection and do not leverage the potential of multimodal data fusion. In this paper, we present a comparison of methods for multimodal object detection in remote sensing, survey available multimodal datasets suitable for evaluation, and discuss future directions.