Abstract:This paper presents a novel mid-wave infrared (MWIR) small target detection dataset (MWIRSTD) comprising 14 video sequences containing approximately 1053 images with annotated targets of three distinct classes of small objects. Captured using cooled MWIR imagers, the dataset offers a unique opportunity for researchers to develop and evaluate state-of-the-art methods for small object detection in realistic MWIR scenes. Unlike existing datasets, which primarily consist of uncooled thermal images or synthetic data with targets superimposed onto the background or vice versa, MWIRSTD provides authentic MWIR data with diverse targets and environments. Extensive experiments on various traditional methods and deep learning-based techniques for small target detection are performed on the proposed dataset, providing valuable insights into their efficacy. The dataset and code are available at https://github.com/avinres/MWIRSTD.
Abstract:Human pose estimation faces hurdles in real-world applications due to factors like lighting changes, occlusions, and cluttered environments. We introduce a unique RGB-Thermal Nearly Paired and Annotated 2D Pose Dataset, comprising over 2,400 high-quality LWIR (thermal) images. Each image is meticulously annotated with 2D human poses, offering a valuable resource for researchers and practitioners. This dataset, captured from seven actors performing diverse everyday activities like sitting, eating, and walking, facilitates pose estimation on occlusion and other challenging scenarios. We benchmark state-of-the-art pose estimation methods on the dataset to showcase its potential, establishing a strong baseline for future research. Our results demonstrate the dataset's effectiveness in promoting advancements in pose estimation for various applications, including surveillance, healthcare, and sports analytics. The dataset and code are available at https://github.com/avinres/LWIRPOSE
Abstract:Synthesizing high quality saliency maps from noisy images is a challenging problem in computer vision and has many practical applications. Samples generated by existing techniques for saliency detection cannot handle the noise perturbations smoothly and fail to delineate the salient objects present in the given scene. In this paper, we present a novel end-to-end coupled Denoising based Saliency Prediction with Generative Adversarial Network (DSAL-GAN) framework to address the problem of salient object detection in noisy images. DSAL-GAN consists of two generative adversarial-networks (GAN) trained end-to-end to perform denoising and saliency prediction altogether in a holistic manner. The first GAN consists of a generator which denoises the noisy input image, and in the discriminator counterpart we check whether the output is a denoised image or ground truth original image. The second GAN predicts the saliency maps from raw pixels of the input denoised image using a data-driven metric based on saliency prediction method with adversarial loss. Cycle consistency loss is also incorporated to further improve salient region prediction. We demonstrate with comprehensive evaluation that the proposed framework outperforms several baseline saliency models on various performance benchmarks.