Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Haitao Fu

PolyGlotFake: A Novel Multilingual and Multimodal DeepFake Dataset

May 14, 2024

Yang Hou, Haitao Fu, Chuankai Chen, Zida Li, Haoyu Zhang, Jianjun Zhao

Abstract:With the rapid advancement of generative AI, multimodal deepfakes, which manipulate both audio and visual modalities, have drawn increasing public concern. Currently, deepfake detection has emerged as a crucial strategy in countering these growing threats. However, as a key factor in training and validating deepfake detectors, most existing deepfake datasets primarily focus on the visual modal, and the few that are multimodal employ outdated techniques, and their audio content is limited to a single language, thereby failing to represent the cutting-edge advancements and globalization trends in current deepfake technologies. To address this gap, we propose a novel, multilingual, and multimodal deepfake dataset: PolyGlotFake. It includes content in seven languages, created using a variety of cutting-edge and popular Text-to-Speech, voice cloning, and lip-sync technologies. We conduct comprehensive experiments using state-of-the-art detection methods on PolyGlotFake dataset. These experiments demonstrate the dataset's significant challenges and its practical value in advancing research into multimodal deepfake detection.

* 13 page, 4 figures

Via

Access Paper or Ask Questions

Channel-Spatial-Based Few-Shot Bird Sound Event Detection

Jun 25, 2023

Lingwen Liu, Yuxuan Feng, Haitao Fu, Yajie Yang, Xin Pan, Chenlei Jin

Figure 1 for Channel-Spatial-Based Few-Shot Bird Sound Event Detection

Figure 2 for Channel-Spatial-Based Few-Shot Bird Sound Event Detection

Figure 3 for Channel-Spatial-Based Few-Shot Bird Sound Event Detection

Figure 4 for Channel-Spatial-Based Few-Shot Bird Sound Event Detection

Abstract:In this paper, we propose a model for bird sound event detection that focuses on a small number of training samples within the everyday long-tail distribution. As a result, we investigate bird sound detection using the few-shot learning paradigm. By integrating channel and spatial attention mechanisms, improved feature representations can be learned from few-shot training datasets. We develop a Metric Channel-Spatial Network model by incorporating a Channel Spatial Squeeze-Excitation block into the prototype network, combining it with these attention mechanisms. We evaluate the Metric Channel Spatial Network model on the DCASE 2022 Take5 dataset benchmark, achieving an F-measure of 66.84% and a PSDS of 58.98%. Our experiment demonstrates that the combination of channel and spatial attention mechanisms effectively enhances the performance of bird sound classification and detection.

* 2023 Asia Pacific Signal and Information Processing Association Annual Summit and Conference

Via

Access Paper or Ask Questions