Abstract:Infrared small target detection (ISTD) has a wide range of applications in early warning, rescue, and guidance. However, CNN based deep learning methods are not effective at segmenting infrared small target (IRST) that it lack of clear contour and texture features, and transformer based methods also struggle to achieve significant results due to the absence of convolution induction bias. To address these issues, we propose a new model called attention with bilinear correlation (ABC), which is based on the transformer architecture and includes a convolution linear fusion transformer (CLFT) module with a novel attention mechanism for feature extraction and fusion, which effectively enhances target features and suppresses noise. Additionally, our model includes a u-shaped convolution-dilated convolution (UCDC) module located deeper layers of the network, which takes advantage of the smaller resolution of deeper features to obtain finer semantic information. Experimental results on public datasets demonstrate that our approach achieves state-of-the-art performance. Code is available at https://github.com/PANPEIWEN/ABC
Abstract:Infrared small object detection (ISOS) aims to segment small objects only covered with several pixels from clutter background in infrared images. It's of great challenge due to: 1) small objects lack of sufficient intensity, shape and texture information; 2) small objects are easily lost in the process where detection models, say deep neural networks, obtain high-level semantic features and image-level receptive fields through successive downsampling. This paper proposes a reliable detection model for ISOS, dubbed UCFNet, which can handle well the two issues. It builds upon central difference convolution (CDC) and fast Fourier convolution (FFC). On one hand, CDC can effectively guide the network to learn the contrast information between small objects and the background, as the contrast information is very essential in human visual system dealing with the ISOS task. On the other hand, FFC can gain image-level receptive fields and extract global information while preventing small objects from being overwhelmed.Experiments on several public datasets demonstrate that our method significantly outperforms the state-of-the-art ISOS models, and can provide useful guidelines for designing better ISOS deep models. Codes will be available soon.