We present a novel multimodal dataset developed by expert astronomers to automate the detection and localisation of multi-component extended radio galaxies and their corresponding infrared hosts. The dataset comprises 4,155 instances of galaxies in 2,800 images with both radio and infrared modalities. Each instance contains information on the extended radio galaxy class, its corresponding bounding box that encompasses all of its components, pixel-level segmentation mask, and the position of its corresponding infrared host galaxy. Our dataset is the first publicly accessible dataset that includes images from a highly sensitive radio telescope, infrared satellite, and instance-level annotations for their identification. We benchmark several object detection algorithms on the dataset and propose a novel multimodal approach to identify radio galaxies and the positions of infrared hosts simultaneously.