Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Su Zhou

Penny-Wise and Pound-Foolish in Deepfake Detection

Aug 15, 2024

Yabin Wang, Zhiwu Huang, Su Zhou, Adam Prugel-Bennett, Xiaopeng Hong

Figure 1 for Penny-Wise and Pound-Foolish in Deepfake Detection

Figure 2 for Penny-Wise and Pound-Foolish in Deepfake Detection

Figure 3 for Penny-Wise and Pound-Foolish in Deepfake Detection

Figure 4 for Penny-Wise and Pound-Foolish in Deepfake Detection

Abstract:The diffusion of deepfake technologies has sparked serious concerns about its potential misuse across various domains, prompting the urgent need for robust detection methods. Despite advancement, many current approaches prioritize short-term gains at expense of long-term effectiveness. This paper critiques the overly specialized approach of fine-tuning pre-trained models solely with a penny-wise objective on a single deepfake dataset, while disregarding the pound-wise balance for generalization and knowledge retention. To address this "Penny-Wise and Pound-Foolish" issue, we propose a novel learning framework (PoundNet) for generalization of deepfake detection on a pre-trained vision-language model. PoundNet incorporates a learnable prompt design and a balanced objective to preserve broad knowledge from upstream tasks (object classification) while enhancing generalization for downstream tasks (deepfake detection). We train PoundNet on a standard single deepfake dataset, following common practice in the literature. We then evaluate its performance across 10 public large-scale deepfake datasets with 5 main evaluation metrics-forming the largest benchmark test set for assessing the generalization ability of deepfake detection models, to our knowledge. The comprehensive benchmark evaluation demonstrates the proposed PoundNet is significantly less "Penny-Wise and Pound-Foolish", achieving a remarkable improvement of 19% in deepfake detection performance compared to state-of-the-art methods, while maintaining a strong performance of 63% on object classification tasks, where other deepfake detection models tend to be ineffective. Code and data are open-sourced at https://github.com/iamwangyabin/PoundNet.

Via

Access Paper or Ask Questions

AutoGluon-Multimodal (AutoMM): Supercharging Multimodal AutoML with Foundation Models

Apr 30, 2024

Zhiqiang Tang, Haoyang Fang, Su Zhou, Taojiannan Yang, Zihan Zhong, Tony Hu, Katrin Kirchhoff, George Karypis

Figure 1 for AutoGluon-Multimodal (AutoMM): Supercharging Multimodal AutoML with Foundation Models

Figure 2 for AutoGluon-Multimodal (AutoMM): Supercharging Multimodal AutoML with Foundation Models

Figure 3 for AutoGluon-Multimodal (AutoMM): Supercharging Multimodal AutoML with Foundation Models

Figure 4 for AutoGluon-Multimodal (AutoMM): Supercharging Multimodal AutoML with Foundation Models

Abstract:AutoGluon-Multimodal (AutoMM) is introduced as an open-source AutoML library designed specifically for multimodal learning. Distinguished by its exceptional ease of use, AutoMM enables fine-tuning of foundation models with just three lines of code. Supporting various modalities including image, text, and tabular data, both independently and in combination, the library offers a comprehensive suite of functionalities spanning classification, regression, object detection, semantic matching, and image segmentation. Experiments across diverse datasets and tasks showcases AutoMM's superior performance in basic classification and regression tasks compared to existing AutoML tools, while also demonstrating competitive results in advanced tasks, aligning with specialized toolboxes designed for such purposes.

* Accepted at AutoML 2024 Conference

Via

Access Paper or Ask Questions