Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Harry Cheng

Fair Deepfake Detectors Can Generalize

Jul 03, 2025

Harry Cheng, Ming-Hui Liu, Yangyang Guo, Tianyi Wang, Liqiang Nie, Mohan Kankanhalli

Abstract:Deepfake detection models face two critical challenges: generalization to unseen manipulations and demographic fairness among population groups. However, existing approaches often demonstrate that these two objectives are inherently conflicting, revealing a trade-off between them. In this paper, we, for the first time, uncover and formally define a causal relationship between fairness and generalization. Building on the back-door adjustment, we show that controlling for confounders (data distribution and model capacity) enables improved generalization via fairness interventions. Motivated by this insight, we propose Demographic Attribute-insensitive Intervention Detection (DAID), a plug-and-play framework composed of: i) Demographic-aware data rebalancing, which employs inverse-propensity weighting and subgroup-wise feature normalization to neutralize distributional biases; and ii) Demographic-agnostic feature aggregation, which uses a novel alignment loss to suppress sensitive-attribute signals. Across three cross-domain benchmarks, DAID consistently achieves superior performance in both fairness and generalization compared to several state-of-the-art detectors, validating both its theoretical foundation and practical effectiveness.

* 14 pages, version 1

Via

Access Paper or Ask Questions

Learning Real Facial Concepts for Independent Deepfake Detection

May 07, 2025

Ming-Hui Liu, Harry Cheng, Tianyi Wang, Xin Luo, Xin-Shun Xu

Abstract:Deepfake detection models often struggle with generalization to unseen datasets, manifesting as misclassifying real instances as fake in target domains. This is primarily due to an overreliance on forgery artifacts and a limited understanding of real faces. To address this challenge, we propose a novel approach RealID to enhance generalization by learning a comprehensive concept of real faces while assessing the probabilities of belonging to the real and fake classes independently. RealID comprises two key modules: the Real Concept Capture Module (RealC2) and the Independent Dual-Decision Classifier (IDC). With the assistance of a MultiReal Memory, RealC2 maintains various prototypes for real faces, allowing the model to capture a comprehensive concept of real class. Meanwhile, IDC redefines the classification strategy by making independent decisions based on the concept of the real class and the presence of forgery artifacts. Through the combined effect of the above modules, the influence of forgery-irrelevant patterns is alleviated, and extensive experiments on five widely used datasets demonstrate that RealID significantly outperforms existing state-of-the-art methods, achieving a 1.74% improvement in average accuracy.

* Accepted by IJCAI 2025

Via

Access Paper or Ask Questions

FractalForensics: Proactive Deepfake Detection and Localization via Fractal Watermarks

Apr 13, 2025

Tianyi Wang, Harry Cheng, Ming-Hui Liu, Mohan Kankanhalli

Abstract:Proactive Deepfake detection via robust watermarks has been raised ever since passive Deepfake detectors encountered challenges in identifying high-quality synthetic images. However, while demonstrating reasonable detection performance, they lack localization functionality and explainability in detection results. Additionally, the unstable robustness of watermarks can significantly affect the detection performance accordingly. In this study, we propose novel fractal watermarks for proactive Deepfake detection and localization, namely FractalForensics. Benefiting from the characteristics of fractals, we devise a parameter-driven watermark generation pipeline that derives fractal-based watermarks and conducts one-way encryption regarding the parameters selected. Subsequently, we propose a semi-fragile watermarking framework for watermark embedding and recovery, trained to be robust against benign image processing operations and fragile when facing Deepfake manipulations in a black-box setting. Meanwhile, we introduce an entry-to-patch strategy that implicitly embeds the watermark matrix entries into image patches at corresponding positions, achieving localization of Deepfake manipulations. Extensive experiments demonstrate satisfactory robustness and fragility of our approach against common image processing operations and Deepfake manipulations, outperforming state-of-the-art semi-fragile watermarking algorithms and passive detectors for Deepfake detection. Furthermore, by highlighting the areas manipulated, our method provides explainability for the proactive Deepfake detection results.

Via

Access Paper or Ask Questions

NullSwap: Proactive Identity Cloaking Against Deepfake Face Swapping

Mar 24, 2025

Tianyi Wang, Harry Cheng, Xiao Zhang, Yinglong Wang

Abstract:Suffering from performance bottlenecks in passively detecting high-quality Deepfake images due to the advancement of generative models, proactive perturbations offer a promising approach to disabling Deepfake manipulations by inserting signals into benign images. However, existing proactive perturbation approaches remain unsatisfactory in several aspects: 1) visual degradation due to direct element-wise addition; 2) limited effectiveness against face swapping manipulation; 3) unavoidable reliance on white- and grey-box settings to involve generative models during training. In this study, we analyze the essence of Deepfake face swapping and argue the necessity of protecting source identities rather than target images, and we propose NullSwap, a novel proactive defense approach that cloaks source image identities and nullifies face swapping under a pure black-box scenario. We design an Identity Extraction module to obtain facial identity features from the source image, while a Perturbation Block is then devised to generate identity-guided perturbations accordingly. Meanwhile, a Feature Block extracts shallow-level image features, which are then fused with the perturbation in the Cloaking Block for image reconstruction. Furthermore, to ensure adaptability across different identity extractors in face swapping algorithms, we propose Dynamic Loss Weighting to adaptively balance identity losses. Experiments demonstrate the outstanding ability of our approach to fool various identity recognition models, outperforming state-of-the-art proactive perturbations in preventing face swapping models from generating images with correct source identities.

Via

Access Paper or Ask Questions

LampMark: Proactive Deepfake Detection via Training-Free Landmark Perceptual Watermarks

Nov 26, 2024

Tianyi Wang, Mengxiao Huang, Harry Cheng, Xiao Zhang, Zhiqi Shen

Figure 1 for LampMark: Proactive Deepfake Detection via Training-Free Landmark Perceptual Watermarks

Figure 2 for LampMark: Proactive Deepfake Detection via Training-Free Landmark Perceptual Watermarks

Figure 3 for LampMark: Proactive Deepfake Detection via Training-Free Landmark Perceptual Watermarks

Figure 4 for LampMark: Proactive Deepfake Detection via Training-Free Landmark Perceptual Watermarks

Abstract:Deepfake facial manipulation has garnered significant public attention due to its impacts on enhancing human experiences and posing privacy threats. Despite numerous passive algorithms that have been attempted to thwart malicious Deepfake attacks, they mostly struggle with the generalizability challenge when confronted with hyper-realistic synthetic facial images. To tackle the problem, this paper proposes a proactive Deepfake detection approach by introducing a novel training-free landmark perceptual watermark, LampMark for short. We first analyze the structure-sensitive characteristics of Deepfake manipulations and devise a secure and confidential transformation pipeline from the structural representations, i.e. facial landmarks, to binary landmark perceptual watermarks. Subsequently, we present an end-to-end watermarking framework that imperceptibly and robustly embeds and extracts watermarks concerning the images to be protected. Relying on promising watermark recovery accuracies, Deepfake detection is accomplished by assessing the consistency between the content-matched landmark perceptual watermark and the robustly recovered watermark of the suspect image. Experimental results demonstrate the superior performance of our approach in watermark recovery and Deepfake detection compared to state-of-the-art methods across in-dataset, cross-dataset, and cross-manipulation scenarios.

* Accepted to ACM MM 2024

Via

Access Paper or Ask Questions

Social Debiasing for Fair Multi-modal LLMs

Aug 13, 2024

Harry Cheng, Yangyang Guo, Qingpei Guo, Ming Yang, Tian Gan, Liqiang Nie

Abstract:Multi-modal Large Language Models (MLLMs) have advanced significantly, offering powerful vision-language understanding capabilities. However, these models often inherit severe social biases from their training datasets, leading to unfair predictions based on attributes like race and gender. This paper addresses the issue of social biases in MLLMs by i) Introducing a comprehensive Counterfactual dataset with Multiple Social Concepts (CMSC), which provides a more diverse and extensive training set compared to existing datasets. ii) Proposing an Anti-Stereotype Debiasing strategy (ASD). Our method works by revisiting the MLLM training process, rescaling the autoregressive loss function, and improving data sampling methods to counteract biases. Through extensive experiments on various MLLMs, our CMSC dataset and ASD method demonstrate a significant reduction in social biases while maintaining the models' original performance.

Via

Access Paper or Ask Questions

Bridging the Intent Gap: Knowledge-Enhanced Visual Generation

May 21, 2024

Yi Cheng, Ziwei Xu, Dongyun Lin, Harry Cheng, Yongkang Wong, Ying Sun, Joo Hwee Lim, Mohan Kankanhalli

Figure 1 for Bridging the Intent Gap: Knowledge-Enhanced Visual Generation

Figure 2 for Bridging the Intent Gap: Knowledge-Enhanced Visual Generation

Figure 3 for Bridging the Intent Gap: Knowledge-Enhanced Visual Generation

Figure 4 for Bridging the Intent Gap: Knowledge-Enhanced Visual Generation

Abstract:For visual content generation, discrepancies between user intentions and the generated content have been a longstanding problem. This discrepancy arises from two main factors. First, user intentions are inherently complex, with subtle details not fully captured by input prompts. The absence of such details makes it challenging for generative models to accurately reflect the intended meaning, leading to a mismatch between the desired and generated output. Second, generative models trained on visual-label pairs lack the comprehensive knowledge to accurately represent all aspects of the input data in their generated outputs. To address these challenges, we propose a knowledge-enhanced iterative refinement framework for visual content generation. We begin by analyzing and identifying the key challenges faced by existing generative models. Then, we introduce various knowledge sources, including human insights, pre-trained models, logic rules, and world knowledge, which can be leveraged to address these challenges. Furthermore, we propose a novel visual generation framework that incorporates a knowledge-based feedback module to iteratively refine the generation process. This module gradually improves the alignment between the generated content and user intentions. We demonstrate the efficacy of the proposed framework through preliminary results, highlighting the potential of knowledge-enhanced generative models for intention-aligned content generation.

Via

Access Paper or Ask Questions

LOC-ZSON: Language-driven Object-Centric Zero-Shot Object Retrieval and Navigation

May 08, 2024

Tianrui Guan, Yurou Yang, Harry Cheng, Muyuan Lin, Richard Kim, Rajasimman Madhivanan, Arnie Sen, Dinesh Manocha

Figure 1 for LOC-ZSON: Language-driven Object-Centric Zero-Shot Object Retrieval and Navigation

Figure 2 for LOC-ZSON: Language-driven Object-Centric Zero-Shot Object Retrieval and Navigation

Figure 3 for LOC-ZSON: Language-driven Object-Centric Zero-Shot Object Retrieval and Navigation

Figure 4 for LOC-ZSON: Language-driven Object-Centric Zero-Shot Object Retrieval and Navigation

Abstract:In this paper, we present LOC-ZSON, a novel Language-driven Object-Centric image representation for object navigation task within complex scenes. We propose an object-centric image representation and corresponding losses for visual-language model (VLM) fine-tuning, which can handle complex object-level queries. In addition, we design a novel LLM-based augmentation and prompt templates for stability during training and zero-shot inference. We implement our method on Astro robot and deploy it in both simulated and real-world environments for zero-shot object navigation. We show that our proposed method can achieve an improvement of 1.38 - 13.38% in terms of text-to-image recall on different benchmark settings for the retrieval task. For object navigation, we show the benefit of our approach in simulation and real world, showing 5% and 16.67% improvement in terms of navigation success rate, respectively.

* Accepted to ICRA 2024

Via

Access Paper or Ask Questions

Diffusion Facial Forgery Detection

Jan 29, 2024

Harry Cheng, Yangyang Guo, Tianyi Wang, Liqiang Nie, Mohan Kankanhalli

Abstract:Detecting diffusion-generated images has recently grown into an emerging research area. Existing diffusion-based datasets predominantly focus on general image generation. However, facial forgeries, which pose a more severe social risk, have remained less explored thus far. To address this gap, this paper introduces DiFF, a comprehensive dataset dedicated to face-focused diffusion-generated images. DiFF comprises over 500,000 images that are synthesized using thirteen distinct generation methods under four conditions. In particular, this dataset leverages 30,000 carefully collected textual and visual prompts, ensuring the synthesis of images with both high fidelity and semantic consistency. We conduct extensive experiments on the DiFF dataset via a human test and several representative forgery detection methods. The results demonstrate that the binary detection accuracy of both human observers and automated detectors often falls below 30%, shedding light on the challenges in detecting diffusion-generated facial forgeries. Furthermore, we propose an edge graph regularization approach to effectively enhance the generalization capability of existing detectors.

* The dataset will be released at \url{https://github.com/xaCheng1996/DiFF}

Via

Access Paper or Ask Questions

Robust Identity Perceptual Watermark Against Deepfake Face Swapping

Nov 02, 2023

Tianyi Wang, Mengxiao Huang, Harry Cheng, Bin Ma, Yinglong Wang

Figure 1 for Robust Identity Perceptual Watermark Against Deepfake Face Swapping

Figure 2 for Robust Identity Perceptual Watermark Against Deepfake Face Swapping

Figure 3 for Robust Identity Perceptual Watermark Against Deepfake Face Swapping

Figure 4 for Robust Identity Perceptual Watermark Against Deepfake Face Swapping

Abstract:Notwithstanding offering convenience and entertainment to society, Deepfake face swapping has caused critical privacy issues with the rapid development of deep generative models. Due to imperceptible artifacts in high-quality synthetic images, passive detection models against face swapping in recent years usually suffer performance damping regarding the generalizability issue. Therefore, several studies have been attempted to proactively protect the original images against malicious manipulations by inserting invisible signals in advance. However, the existing proactive defense approaches demonstrate unsatisfactory results with respect to visual quality, detection accuracy, and source tracing ability. In this study, we propose the first robust identity perceptual watermarking framework that concurrently performs detection and source tracing against Deepfake face swapping proactively. We assign identity semantics regarding the image contents to the watermarks and devise an unpredictable and unreversible chaotic encryption system to ensure watermark confidentiality. The watermarks are encoded and recovered by jointly training an encoder-decoder framework along with adversarial image manipulations. Extensive experiments demonstrate state-of-the-art performance against Deepfake face swapping under both cross-dataset and cross-manipulation settings.

* Submitted for review

Via

Access Paper or Ask Questions