Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Alessandro Ilic Mezza

The IEEE-IS2 2024 Music Packet Loss Concealment Challenge

Sep 27, 2024

Alessandro Ilic Mezza, Alberto Bernardini

Figure 1 for The IEEE-IS2 2024 Music Packet Loss Concealment Challenge

Figure 2 for The IEEE-IS2 2024 Music Packet Loss Concealment Challenge

Figure 3 for The IEEE-IS2 2024 Music Packet Loss Concealment Challenge

Figure 4 for The IEEE-IS2 2024 Music Packet Loss Concealment Challenge

Abstract:We present the IEEE-IS2 2024 Music Packet Loss Concealment Challenge. We begin by detailing the challenge rules, followed by an overview of the provided baseline system, the blind test set, and the evaluation methodology used to determine the final ranking. This inaugural edition aimed to foster collaboration between researchers and practitioners from the fields of signal processing, machine learning, and networked music performance, while also laying the groundwork for future advancements in packet loss concealment for music signals.

* 8 pages, 4 figures, 3 tables. Official report of the IEEE-IS2 2024 Music Packet Loss Concealment Challenge, part of the 2nd International Workshop on Networked Immersive Audio

Via

Access Paper or Ask Questions

Leveraging Mixture of Experts for Improved Speech Deepfake Detection

Sep 24, 2024

Viola Negroni, Davide Salvi, Alessandro Ilic Mezza, Paolo Bestagini, Stefano Tubaro

Figure 1 for Leveraging Mixture of Experts for Improved Speech Deepfake Detection

Abstract:Speech deepfakes pose a significant threat to personal security and content authenticity. Several detectors have been proposed in the literature, and one of the primary challenges these systems have to face is the generalization over unseen data to identify fake signals across a wide range of datasets. In this paper, we introduce a novel approach for enhancing speech deepfake detection performance using a Mixture of Experts architecture. The Mixture of Experts framework is well-suited for the speech deepfake detection task due to its ability to specialize in different input types and handle data variability efficiently. This approach offers superior generalization and adaptability to unseen data compared to traditional single models or ensemble methods. Additionally, its modular structure supports scalable updates, making it more flexible in managing the evolving complexity of deepfake techniques while maintaining high detection accuracy. We propose an efficient, lightweight gating mechanism to dynamically assign expert weights for each input, optimizing detection performance. Experimental results across multiple datasets demonstrate the effectiveness and potential of our proposed approach.

* Submitted to ICASSP 2025

Via

Access Paper or Ask Questions

Data-Driven Room Acoustic Modeling Via Differentiable Feedback Delay Networks With Learnable Delay Lines

Mar 29, 2024

Alessandro Ilic Mezza, Riccardo Giampiccolo, Enzo De Sena, Alberto Bernardini

Abstract:Over the past few decades, extensive research has been devoted to the design of artificial reverberation algorithms aimed at emulating the room acoustics of physical environments. Despite significant advancements, automatic parameter tuning of delay-network models remains an open challenge. We introduce a novel method for finding the parameters of a Feedback Delay Network (FDN) such that its output renders the perceptual qualities of a measured room impulse response. The proposed approach involves the implementation of a differentiable FDN with trainable delay lines, which, for the first time, allows us to simultaneously learn each and every delay-network parameter via backpropagation. The iterative optimization process seeks to minimize a time-domain loss function incorporating differentiable terms accounting for energy decay and echo density. Through experimental validation, we show that the proposed method yields time-invariant frequency-independent FDNs capable of closely matching the desired acoustical characteristics, and outperforms existing methods based on genetic algorithms and analytical filter design.

* The article has been submitted to EURASIP Journal on Audio, Speech, and Music Processing on Jan 02, 2024 and is currently under review

Via

Access Paper or Ask Questions

Toward Deep Drum Source Separation

Dec 15, 2023

Alessandro Ilic Mezza, Riccardo Giampiccolo, Alberto Bernardini, Augusto Sarti

Figure 1 for Toward Deep Drum Source Separation

Figure 2 for Toward Deep Drum Source Separation

Figure 3 for Toward Deep Drum Source Separation

Figure 4 for Toward Deep Drum Source Separation

Abstract:In the past, the field of drum source separation faced significant challenges due to limited data availability, hindering the adoption of cutting-edge deep learning methods that have found success in other related audio applications. In this manuscript, we introduce StemGMD, a large-scale audio dataset of isolated single-instrument drum stems. Each audio clip is synthesized from MIDI recordings of expressive drums performances using ten real-sounding acoustic drum kits. Totaling 1224 hours, StemGMD is the largest audio dataset of drums to date and the first to comprise isolated audio clips for every instrument in a canonical nine-piece drum kit. We leverage StemGMD to develop LarsNet, a novel deep drum source separation model. Through a bank of dedicated U-Nets, LarsNet can separate five stems from a stereo drum mixture faster than real-time and is shown to significantly outperform state-of-the-art nonnegative spectro-temporal factorization methods.

* 9 pages, 2 figures. Submitted to Pattern Recognition Letters

Via

Access Paper or Ask Questions

A Deep Learning Approach for Low-Latency Packet Loss Concealment of Audio Signals in Networked Music Performance Applications

Jul 14, 2020

Prateek Verma, Alessandro Ilic Mezza, Chris Chafe, Cristina Rottondi

Figure 1 for A Deep Learning Approach for Low-Latency Packet Loss Concealment of Audio Signals in Networked Music Performance Applications

Figure 2 for A Deep Learning Approach for Low-Latency Packet Loss Concealment of Audio Signals in Networked Music Performance Applications

Abstract:Networked Music Performance (NMP) is envisioned as a potential game changer among Internet applications: it aims at revolutionizing the traditional concept of musical interaction by enabling remote musicians to interact and perform together through a telecommunication network. Ensuring realistic conditions for music performance, however, constitutes a significant engineering challenge due to extremely strict requirements in terms of audio quality and, most importantly, network delay. To minimize the end-to-end delay experienced by the musicians, typical implementations of NMP applications use un-compressed, bidirectional audio streams and leverage UDP as transport protocol. Being connection less and unreliable,audio packets transmitted via UDP which become lost in transit are not re-transmitted and thus cause glitches in the receiver audio playout. This article describes a technique for predicting lost packet content in real-time using a deep learning approach. The ability of concealing errors in real time can help mitigate audio impairments caused by packet losses, thus improving the quality of audio playout in real-world scenarios.

* 8 pages, 2 figures

Via

Access Paper or Ask Questions

Unsupervised Domain Adaptation for Acoustic Scene Classification Using Band-Wise Statistics Matching

Apr 30, 2020

Alessandro Ilic Mezza, Emanuël A. P. Habets, Meinard Müller, Augusto Sarti

Figure 1 for Unsupervised Domain Adaptation for Acoustic Scene Classification Using Band-Wise Statistics Matching

Figure 2 for Unsupervised Domain Adaptation for Acoustic Scene Classification Using Band-Wise Statistics Matching

Figure 3 for Unsupervised Domain Adaptation for Acoustic Scene Classification Using Band-Wise Statistics Matching

Abstract:The performance of machine learning algorithms is known to be negatively affected by possible mismatches between training (source) and test (target) data distributions. In fact, this problem emerges whenever an acoustic scene classification system which has been trained on data recorded by a given device is applied to samples acquired under different acoustic conditions or captured by mismatched recording devices. To address this issue, we propose an unsupervised domain adaptation method that consists of aligning the first- and second-order sample statistics of each frequency band of target-domain acoustic scenes to the ones of the source-domain training dataset. This model-agnostic approach is devised to adapt audio samples from unseen devices before they are fed to a pre-trained classifier, thus avoiding any further learning phase. Using the DCASE 2018 Task 1-B development dataset, we show that the proposed method outperforms the state-of-the-art unsupervised methods found in the literature in terms of both source- and target-domain classification accuracy.

* 5 pages, 1 figure, 3 tables, submitted to EUSIPCO 2020

Via

Access Paper or Ask Questions