Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Xiong Xu

Perceptual MAE for Image Manipulation Localization: A High-level Vision Learner Focusing on Low-level Features

Oct 10, 2023

Xiaochen Ma, Jizhe Zhou, Xiong Xu, Zhuohang Jiang, Chi-Man Pun

Figure 1 for Perceptual MAE for Image Manipulation Localization: A High-level Vision Learner Focusing on Low-level Features

Figure 2 for Perceptual MAE for Image Manipulation Localization: A High-level Vision Learner Focusing on Low-level Features

Figure 3 for Perceptual MAE for Image Manipulation Localization: A High-level Vision Learner Focusing on Low-level Features

Figure 4 for Perceptual MAE for Image Manipulation Localization: A High-level Vision Learner Focusing on Low-level Features

Abstract:Nowadays, multimedia forensics faces unprecedented challenges due to the rapid advancement of multimedia generation technology thereby making Image Manipulation Localization (IML) crucial in the pursuit of truth. The key to IML lies in revealing the artifacts or inconsistencies between the tampered and authentic areas, which are evident under pixel-level features. Consequently, existing studies treat IML as a low-level vision task, focusing on allocating tampered masks by crafting pixel-level features such as image RGB noises, edge signals, or high-frequency features. However, in practice, tampering commonly occurs at the object level, and different classes of objects have varying likelihoods of becoming targets of tampering. Therefore, object semantics are also vital in identifying the tampered areas in addition to pixel-level features. This necessitates IML models to carry out a semantic understanding of the entire image. In this paper, we reformulate the IML task as a high-level vision task that greatly benefits from low-level features. Based on such an interpretation, we propose a method to enhance the Masked Autoencoder (MAE) by incorporating high-resolution inputs and a perceptual loss supervision module, which is termed Perceptual MAE (PMAE). While MAE has demonstrated an impressive understanding of object semantics, PMAE can also compensate for low-level semantics with our proposed enhancements. Evidenced by extensive experiments, this paradigm effectively unites the low-level and high-level features of the IML task and outperforms state-of-the-art tampering localization methods on all five publicly available datasets.

Via

Access Paper or Ask Questions