Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Mutual Learning for Acoustic Matching and Dereverberation via Visual Scene-driven Diffusion

Jul 15, 2024

Jian Ma, Wenguan Wang, Yi Yang, Feng Zheng

Figure 1 for Mutual Learning for Acoustic Matching and Dereverberation via Visual Scene-driven Diffusion

Figure 2 for Mutual Learning for Acoustic Matching and Dereverberation via Visual Scene-driven Diffusion

Figure 3 for Mutual Learning for Acoustic Matching and Dereverberation via Visual Scene-driven Diffusion

Figure 4 for Mutual Learning for Acoustic Matching and Dereverberation via Visual Scene-driven Diffusion

Share this with someone who'll enjoy it:

Abstract:Visual acoustic matching (VAM) is pivotal for enhancing the immersive experience, and the task of dereverberation is effective in improving audio intelligibility. Existing methods treat each task independently, overlooking the inherent reciprocity between them. Moreover, these methods depend on paired training data, which is challenging to acquire, impeding the utilization of extensive unpaired data. In this paper, we introduce MVSD, a mutual learning framework based on diffusion models. MVSD considers the two tasks symmetrically, exploiting the reciprocal relationship to facilitate learning from inverse tasks and overcome data scarcity. Furthermore, we employ the diffusion model as foundational conditional converters to circumvent the training instability and over-smoothing drawbacks of conventional GAN architectures. Specifically, MVSD employs two converters: one for VAM called reverberator and one for dereverberation called dereverberator. The dereverberator judges whether the reverberation audio generated by reverberator sounds like being in the conditional visual scenario, and vice versa. By forming a closed loop, these two converters can generate informative feedback signals to optimize the inverse tasks, even with easily acquired one-way unpaired data. Extensive experiments on two standard benchmarks, i.e., SoundSpaces-Speech and Acoustic AVSpeech, exhibit that our framework can improve the performance of the reverberator and dereverberator and better match specified visual scenarios.

* ECCV 2024; Project page: https://hechang25.github.io/MVSD

View paper on

Share this with someone who'll enjoy it:

Title:Mutual Learning for Acoustic Matching and Dereverberation via Visual Scene-driven Diffusion

Paper and Code