Abstract:Tiny object detection has gained considerable attention in the research community owing to the frequent occurrence of tiny objects in numerous critical real-world scenarios. However, convolutional neural networks (CNNs) used as the backbone for object detection architectures typically neglect Nyquist's sampling theorem during down-sampling operations, resulting in aliasing and degraded performance. This is likely to be a particular issue for tiny objects that occupy very few pixels and therefore have high spatial frequency features. This paper applied an existing approach WaveCNet for anti-aliasing to tiny object detection. WaveCNet addresses aliasing by replacing standard down-sampling processes in CNNs with Wavelet Pooling (WaveletPool) layers, effectively suppressing aliasing. We modify the original WaveCNet to apply WaveletPool in a consistent way in both pathways of the residual blocks in ResNets. Additionally, we also propose a bottom-heavy version of the backbone, which further improves the performance of tiny object detection while also reducing the required number of parameters by almost half. Experimental results on the TinyPerson, WiderFace, and DOTA datasets demonstrate the importance of anti-aliasing in tiny object detection and the effectiveness of the proposed method which achieves new state-of-the-art results on all three datasets. Codes and experiment results are released at https://github.com/freshn/Anti-aliasing-Tiny-Object-Detection.git.
Abstract:Tiny object detection has become an active area of research because images with tiny targets are common in several important real-world scenarios. However, existing tiny object detection methods use standard deep neural networks as their backbone architecture. We argue that such backbones are inappropriate for detecting tiny objects as they are designed for the classification of larger objects, and do not have the spatial resolution to identify small targets. Specifically, such backbones use max-pooling or a large stride at early stages in the architecture. This produces lower resolution feature-maps that can be efficiently processed by subsequent layers. However, such low-resolution feature-maps do not contain information that can reliably discriminate tiny objects. To solve this problem we design 'bottom-heavy' versions of backbones that allocate more resources to processing higher-resolution features without introducing any additional computational burden overall. We also investigate if pre-training these backbones on images of appropriate size, using CIFAR100 and ImageNet32, can further improve performance on tiny object detection. Results on TinyPerson and WiderFace show that detectors with our proposed backbones achieve better results than the current state-of-the-art methods.