Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Jiaqi Ji

Learning to Disentangle GAN Fingerprint for Fake Image Attribution

Jun 16, 2021

Tianyun Yang, Juan Cao, Qiang Sheng, Lei Li, Jiaqi Ji, Xirong Li, Sheng Tang

Figure 1 for Learning to Disentangle GAN Fingerprint for Fake Image Attribution

Figure 2 for Learning to Disentangle GAN Fingerprint for Fake Image Attribution

Figure 3 for Learning to Disentangle GAN Fingerprint for Fake Image Attribution

Figure 4 for Learning to Disentangle GAN Fingerprint for Fake Image Attribution

Abstract:Rapid pace of generative models has brought about new threats to visual forensics such as malicious personation and digital copyright infringement, which promotes works on fake image attribution. Existing works on fake image attribution mainly rely on a direct classification framework. Without additional supervision, the extracted features could include many content-relevant components and generalize poorly. Meanwhile, how to obtain an interpretable GAN fingerprint to explain the decision remains an open question. Adopting a multi-task framework, we propose a GAN Fingerprint Disentangling Network (GFD-Net) to simultaneously disentangle the fingerprint from GAN-generated images and produce a content-irrelevant representation for fake image attribution. A series of constraints are provided to guarantee the stability and discriminability of the fingerprint, which in turn helps content-irrelevant feature extraction. Further, we perform comprehensive analysis on GAN fingerprint, providing some clues about the properties of GAN fingerprint and which factors dominate the fingerprint in GAN architecture. Experiments show that our GFD-Net achieves superior fake image attribution performance in both closed-world and open-world testing. We also apply our method in binary fake image detection and exhibit a significant generalization ability on unseen generators.

* 10 pages, 7 figures

Via

Access Paper or Ask Questions

Image Manipulation Detection by Multi-View Multi-Scale Supervision

Apr 14, 2021

Xinru Chen, Chengbo Dong, Jiaqi Ji, Juan Cao, Xirong Li

Figure 1 for Image Manipulation Detection by Multi-View Multi-Scale Supervision

Figure 2 for Image Manipulation Detection by Multi-View Multi-Scale Supervision

Figure 3 for Image Manipulation Detection by Multi-View Multi-Scale Supervision

Figure 4 for Image Manipulation Detection by Multi-View Multi-Scale Supervision

Abstract:The key challenge of image manipulation detection is how to learn generalizable features that are sensitive to manipulations in novel data, whilst specific to prevent false alarms on authentic images. Current research emphasizes the sensitivity, with the specificity overlooked. In this paper we address both aspects by multi-view feature learning and multi-scale supervision. By exploiting noise distribution and boundary artifact surrounding tampered regions, the former aims to learn semantic-agnostic and thus more generalizable features. The latter allows us to learn from authentic images which are nontrivial to taken into account by current semantic segmentation network based methods. Our thoughts are realized by a new network which we term MVSS-Net. Extensive experiments on five benchmark sets justify the viability of MVSS-Net for both pixel-level and image-level manipulation detection.

Via

Access Paper or Ask Questions

SEA: Sentence Encoder Assembly for Video Retrieval by Textual Queries

Nov 24, 2020

Xirong Li, Fangming Zhou, Chaoxi Xu, Jiaqi Ji, Gang Yang

Figure 1 for SEA: Sentence Encoder Assembly for Video Retrieval by Textual Queries

Figure 2 for SEA: Sentence Encoder Assembly for Video Retrieval by Textual Queries

Figure 3 for SEA: Sentence Encoder Assembly for Video Retrieval by Textual Queries

Figure 4 for SEA: Sentence Encoder Assembly for Video Retrieval by Textual Queries

Abstract:Retrieving unlabeled videos by textual queries, known as Ad-hoc Video Search (AVS), is a core theme in multimedia data management and retrieval. The success of AVS counts on cross-modal representation learning that encodes both query sentences and videos into common spaces for semantic similarity computation. Inspired by the initial success of previously few works in combining multiple sentence encoders, this paper takes a step forward by developing a new and general method for effectively exploiting diverse sentence encoders. The novelty of the proposed method, which we term Sentence Encoder Assembly (SEA), is two-fold. First, different from prior art that use only a single common space, SEA supports text-video matching in multiple encoder-specific common spaces. Such a property prevents the matching from being dominated by a specific encoder that produces an encoding vector much longer than other encoders. Second, in order to explore complementarities among the individual common spaces, we propose multi-space multi-loss learning. As extensive experiments on four benchmarks (MSR-VTT, TRECVID AVS 2016-2019, TGIF and MSVD) show, SEA surpasses the state-of-the-art. In addition, SEA is extremely ease to implement. All this makes SEA an appealing solution for AVS and promising for continuously advancing the task by harvesting new sentence encoders.

* accepted for publication as a REGULAR paper in the IEEE Transactions on Multimedia

Via

Access Paper or Ask Questions