Abstract:With the rapid advances of image editing techniques in recent years, image manipulation detection has attracted considerable attention since the increasing security risks posed by tampered images. To address these challenges, a novel multi-scale multi-grained deep network (MSMG-Net) is proposed to automatically identify manipulated regions. In our MSMG-Net, a parallel multi-scale feature extraction structure is used to extract multi-scale features. Then the multi-grained feature learning is utilized to perceive object-level semantics relation of multi-scale features by introducing the shunted self-attention. To fuse multi-scale multi-grained features, global and local feature fusion block are designed for manipulated region segmentation by a bottom-up approach and multi-level feature aggregation block is designed for edge artifacts detection by a top-down approach. Thus, MSMG-Net can effectively perceive the object-level semantics and encode the edge artifact. Experimental results on five benchmark datasets justify the superior performance of the proposed method, outperforming state-of-the-art manipulation detection and localization methods. Extensive ablation experiments and feature visualization demonstrate the multi-scale multi-grained learning can present effective visual representations of manipulated regions. In addition, MSMG-Net shows better robustness when various post-processing methods further manipulate images.