Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Arjun Pankajakshan

Enhancing Generalization in Audio Deepfake Detection: A Neural Collapse based Sampling and Training Approach

Apr 19, 2024

Mohammed Yousif, Jonat John Mathew, Huzaifa Pallan, Agamjeet Singh Padda, Syed Daniyal Shah, Sara Adamski, Madhu Reddiboina, Arjun Pankajakshan

Figure 1 for Enhancing Generalization in Audio Deepfake Detection: A Neural Collapse based Sampling and Training Approach

Figure 2 for Enhancing Generalization in Audio Deepfake Detection: A Neural Collapse based Sampling and Training Approach

Figure 3 for Enhancing Generalization in Audio Deepfake Detection: A Neural Collapse based Sampling and Training Approach

Figure 4 for Enhancing Generalization in Audio Deepfake Detection: A Neural Collapse based Sampling and Training Approach

Abstract:Generalization in audio deepfake detection presents a significant challenge, with models trained on specific datasets often struggling to detect deepfakes generated under varying conditions and unknown algorithms. While collectively training a model using diverse datasets can enhance its generalization ability, it comes with high computational costs. To address this, we propose a neural collapse-based sampling approach applied to pre-trained models trained on distinct datasets to create a new training database. Using ASVspoof 2019 dataset as a proof-of-concept, we implement pre-trained models with Resnet and ConvNext architectures. Our approach demonstrates comparable generalization on unseen data while being computationally efficient, requiring less training data. Evaluation is conducted using the In-the-wild dataset.

Via

Access Paper or Ask Questions

Towards the Development of a Real-Time Deepfake Audio Detection System in Communication Platforms

Mar 18, 2024

Jonat John Mathew, Rakin Ahsan, Sae Furukawa, Jagdish Gautham Krishna Kumar, Huzaifa Pallan, Agamjeet Singh Padda, Sara Adamski, Madhu Reddiboina, Arjun Pankajakshan

Figure 1 for Towards the Development of a Real-Time Deepfake Audio Detection System in Communication Platforms

Figure 2 for Towards the Development of a Real-Time Deepfake Audio Detection System in Communication Platforms

Figure 3 for Towards the Development of a Real-Time Deepfake Audio Detection System in Communication Platforms

Figure 4 for Towards the Development of a Real-Time Deepfake Audio Detection System in Communication Platforms

Abstract:Deepfake audio poses a rising threat in communication platforms, necessitating real-time detection for audio stream integrity. Unlike traditional non-real-time approaches, this study assesses the viability of employing static deepfake audio detection models in real-time communication platforms. An executable software is developed for cross-platform compatibility, enabling real-time execution. Two deepfake audio detection models based on Resnet and LCNN architectures are implemented using the ASVspoof 2019 dataset, achieving benchmark performances compared to ASVspoof 2019 challenge baselines. The study proposes strategies and frameworks for enhancing these models, paving the way for real-time deepfake audio detection in communication platforms. This work contributes to the advancement of audio stream security, ensuring robust detection capabilities in dynamic, real-time communication scenarios.

Via

Access Paper or Ask Questions

Memory Controlled Sequential Self Attention for Sound Recognition

Jun 11, 2020

Arjun Pankajakshan, Helen L. Bear, Vinod Subramanian, Emmanouil Benetos

Figure 1 for Memory Controlled Sequential Self Attention for Sound Recognition

Figure 2 for Memory Controlled Sequential Self Attention for Sound Recognition

Figure 3 for Memory Controlled Sequential Self Attention for Sound Recognition

Abstract:In this paper we investigate the importance of the extent of memory in sequential self attention for sound recognition. We propose to use a memory controlled sequential self attention mechanism on top of a convolutional recurrent neural network (CRNN) model for polyphonic sound event detection (SED). Experiments on the URBAN-SED dataset demonstrate the impact of the extent of memory on sound recognition performance with the self attention induced SED model. We extend the proposed idea with a multi-head self attention mechanism where each attention head processes the audio embedding with explicit attention width values. The proposed use of memory controlled sequential self attention offers a way to induce relations among frames of sound event tokens. We show that our memory controlled self attention model achieves an event based F -score of 33.92% on the URBAN-SED dataset, outperforming the F -score of 20.10% reported by the model without self attention.

* Submitted to INTERSPEECH 2020

Via

Access Paper or Ask Questions