Abstract:Automatic prohibited item detection in X-ray images is crucial for public safety. However, most existing detection methods either rely on expensive box annotations to achieve high performance or use weak annotations but suffer from limited accuracy. To balance annotation cost and detection performance, we study Weakly Semi-Supervised X-ray Prohibited Item Detection with Points (WSSPID-P) and propose a novel \textbf{B}oundary-\textbf{C}ategory \textbf{R}efinement \textbf{Net}work (\textbf{BCR-Net}) that requires only a few box annotations and a large number of point annotations. BCR-Net is built based on Group R-CNN and introduces a new Boundary Refinement (BR) module and a new Category Refinement (CR) module. The BR module develops a dual attention mechanism to focus on both the boundaries and salient features of prohibited items. Meanwhile, the CR module incorporates contrastive branches into the heads of RPN and ROI by introducing a scale- and rotation-aware contrastive loss, enhancing intra-class consistency and inter-class separability in the feature space. Based on the above designs, BCR-Net effectively addresses the closely related problems of imprecise localization and inaccurate classification. Experimental results on public X-ray datasets show the effectiveness of BCR-Net, achieving significant performance improvements to state-of-the-art methods under limited annotations.
Abstract:Automatic detection of prohibited items in X-ray images plays a crucial role in public security. However, existing methods rely heavily on labor-intensive box annotations. To address this, we investigate X-ray prohibited item detection under labor-efficient point supervision and develop an intra-inter objectness learning network (I$^2$OL-Net). I$^2$OL-Net consists of two key modules: an intra-modality objectness learning (intra-OL) module and an inter-modality objectness learning (inter-OL) module. The intra-OL module designs a local focus Gaussian masking block and a global random Gaussian masking block to collaboratively learn the objectness in X-ray images. Meanwhile, the inter-OL module introduces the wavelet decomposition-based adversarial learning block and the objectness block, effectively reducing the modality discrepancy and transferring the objectness knowledge learned from natural images with box annotations to X-ray images. Based on the above, I$^2$OL-Net greatly alleviates the problem of part domination caused by severe intra-class variations in X-ray images. Experimental results on four X-ray datasets show that I$^2$OL-Net can achieve superior performance with a significant reduction of annotation cost, thus enhancing its accessibility and practicality.