Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Adrian Meise

Diminishing Domain Mismatch for DNN-Based Acoustic Distance Estimation via Stochastic Room Reverberation Models

Aug 26, 2024

Tobias Gburrek, Adrian Meise, Joerg Schmalenstroeer, Reinhold Haeb-Umbach

Abstract:The room impulse response (RIR) encodes, among others, information about the distance of an acoustic source from the sensors. Deep neural networks (DNNs) have been shown to be able to extract that information for acoustic distance estimation. Since there exists only a very limited amount of annotated data, e.g., RIRs with distance information, training a DNN for acoustic distance estimation has to rely on simulated RIRs, resulting in an unavoidable mismatch to RIRs of real rooms. In this contribution, we show that this mismatch can be reduced by a novel combination of geometric and stochastic modeling of RIRs, resulting in a significantly improved distance estimation accuracy.

* Accepted for publication IWAENC 2024

Via

Access Paper or Ask Questions

Investigating Speaker Embedding Disentanglement on Natural Read Speech

Aug 08, 2023

Michael Kuhlmann, Adrian Meise, Fritz Seebauer, Petra Wagner, Reinhold Haeb-Umbach

Figure 1 for Investigating Speaker Embedding Disentanglement on Natural Read Speech

Figure 2 for Investigating Speaker Embedding Disentanglement on Natural Read Speech

Figure 3 for Investigating Speaker Embedding Disentanglement on Natural Read Speech

Figure 4 for Investigating Speaker Embedding Disentanglement on Natural Read Speech

Abstract:Disentanglement is the task of learning representations that identify and separate factors that explain the variation observed in data. Disentangled representations are useful to increase the generalizability, explainability, and fairness of data-driven models. Only little is known about how well such disentanglement works for speech representations. A major challenge when tackling disentanglement for speech representations are the unknown generative factors underlying the speech signal. In this work, we investigate to what degree speech representations encoding speaker identity can be disentangled. To quantify disentanglement, we identify acoustic features that are highly speaker-variant and can serve as proxies for the factors of variation underlying speech. We find that disentanglement of the speaker embedding is limited when trained with standard objectives promoting disentanglement but can be improved over vanilla representation learning to some extent.

* To be published at 15th ITG conference on speech communication

Via

Access Paper or Ask Questions