Abstract:The mushroomed Deepfake synthetic materials circulated on the internet have raised serious social impact to politicians, celebrities, and every human being on earth. In this paper, we provide a thorough review of the existing models following the development history of the Deepfake detection studies and define the research challenges of Deepfake detection in three aspects, namely, transferability, interpretability, and reliability. While the transferability and interpretability challenges have both been frequently discussed and attempted to solve with quantitative evaluations, the reliability issue has been barely considered, leading to the lack of reliable evidence in real-life usages and even for prosecutions on Deepfake related cases in court. We therefore conduct a model reliability study scheme using statistical random sampling knowledge and the publicly available benchmark datasets to qualitatively validate the detection performance of the existing models on arbitrary Deepfake candidate suspects. A barely remarked systematic data pre-processing procedure is demonstrated along with the fair training and testing experiments on the existing detection models. Case studies are further executed to justify the real-life Deepfake cases including different groups of victims with the help of reliably qualified detection models. The model reliability study provides a workflow for the detection models to act as or assist evidence for Deepfake forensic investigation in court once approved by authentication experts or institutions.
Abstract:Recently, Deepfake has drawn considerable public attention due to security and privacy concerns in social media digital forensics. As the wildly spreading Deepfake videos on the Internet become more realistic, traditional detection techniques have failed in distinguishing between the real and fake. Most existing deep learning methods mainly focus on local features and relations within the face image using convolutional neural networks as a backbone. However, local features and relations are insufficient for model training to learn enough general information for Deepfake detection. Therefore, the existing Deepfake detection methods have reached a bottleneck to further improving the detection performance. To address this issue, we propose a deep convolutional Transformer to incorporate the decisive image features both locally and globally. Specifically, we apply convolutional pooling and re-attention to enrich the extracted features and enhance the efficacy. Moreover, we employ the barely discussed image keyframes in model training for performance improvement and visualize the feature quantity gap between the key and normal image frames caused by video compression. We finally illustrate the transferability with extensive experiments on several Deepfake benchmark datasets. The proposed solution consistently outperforms several state-of-the-art baselines on both within- and cross-dataset experiments.