Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Reishi Kondo

Trainingless Adaptation of Pretrained Models for Environmental Sound Classification

Dec 23, 2024

Noriyuki Tonami, Wataru Kohno, Keisuke Imoto, Yoshiyuki Yajima, Sakiko Mishima, Reishi Kondo, Tomoyuki Hino

Abstract:Deep neural network (DNN)-based models for environmental sound classification are not robust against a domain to which training data do not belong, that is, out-of-distribution or unseen data. To utilize pretrained models for the unseen domain, adaptation methods, such as finetuning and transfer learning, are used with rich computing resources, e.g., the graphical processing unit (GPU). However, it is becoming more difficult to keep up with research trends for those who have poor computing resources because state-of-the-art models are becoming computationally resource-intensive. In this paper, we propose a trainingless adaptation method for pretrained models for environmental sound classification. To introduce the trainingless adaptation method, we first propose an operation of recovering time--frequency-ish (TF-ish) structures in intermediate layers of DNN models. We then propose the trainingless frequency filtering method for domain adaptation, which is not a gradient-based optimization widely used. The experiments conducted using the ESC-50 dataset show that the proposed adaptation method improves the classification accuracy by 20.40 percentage points compared with the conventional method.

* Accepted to ICASSP2025

Via

Access Paper or Ask Questions

Low-rank constrained multichannel signal denoising considering channel-dependent sensitivity inspired by self-supervised learning for optical fiber sensing

Dec 16, 2023

Noriyuki Tonami, Wataru Kohno, Sakiko Mishima, Yumi Arai, Reishi Kondo, Tomoyuki Hino

Abstract:Optical fiber sensing is a technology wherein audio, vibrations, and temperature are detected using an optical fiber; especially the audio/vibrations-aware sensing is called distributed acoustic sensing (DAS). In DAS, observed data, which is comprised of multichannel data, has suffered from severe noise levels because of the optical noise or the installation methods. In conventional methods for denoising DAS data, signal-processing- or deep-neural-network (DNN)-based models have been studied. The signal-processing-based methods have the interpretability, i.e., non-black box. The DNN-based methods are good at flexibility designing network architectures and objective functions, that is, priors. However, there is no balance between the interpretability and the flexibility of priors in the DAS studies. The DNN-based methods also require a large amount of training data in general. To address the problems, we propose a DNN-structure signal-processing-based denoising method in this paper. As the priors of DAS, we employ spatial knowledge; low rank and channel-dependent sensitivity using the DNN-based structure. The result of fiber-acoustic sensing shows that the proposed method outperforms the conventional methods and the robustness to the number of the spatial ranks. Moreover, the optimized parameters of the proposed method indicate the relationship with the channel sensitivity; the interpretability.

* Accepted for ICASSP2024

Via

Access Paper or Ask Questions

Impact of Sound Duration and Inactive Frames on Sound Event Detection Performance

Feb 03, 2021

Keisuke Imoto, Sakiko Mishima, Yumi Arai, Reishi Kondo

Figure 1 for Impact of Sound Duration and Inactive Frames on Sound Event Detection Performance

Figure 2 for Impact of Sound Duration and Inactive Frames on Sound Event Detection Performance

Figure 3 for Impact of Sound Duration and Inactive Frames on Sound Event Detection Performance

Figure 4 for Impact of Sound Duration and Inactive Frames on Sound Event Detection Performance

Abstract:In many methods of sound event detection (SED), a segmented time frame is regarded as one data sample to model training. The durations of sound events greatly depend on the sound event class, e.g., the sound event "fan" has a long duration, whereas the sound event "mouse clicking" is instantaneous. Thus, the difference in the duration between sound event classes results in a serious data imbalance in SED. Moreover, most sound events tend to occur occasionally; therefore, there are many more inactive time frames of sound events than active frames. This also causes a severe data imbalance between active and inactive frames. In this paper, we investigate the impact of sound duration and inactive frames on SED performance by introducing four loss functions, such as simple reweighting loss, inverse frequency loss, asymmetric focal loss, and focal batch Tversky loss. Then, we provide insights into how we tackle this imbalance problem.

* Accepted to ICASSP 2021. arXiv admin note: text overlap with arXiv:2006.15253

Via

Access Paper or Ask Questions

Bayesian Non-Parametric Multi-Source Modelling Based Determined Blind Source Separation

Apr 08, 2019

Chaitanya Narisetty, Tatsuya Komatsu, Reishi Kondo

Figure 1 for Bayesian Non-Parametric Multi-Source Modelling Based Determined Blind Source Separation

Figure 2 for Bayesian Non-Parametric Multi-Source Modelling Based Determined Blind Source Separation

Figure 3 for Bayesian Non-Parametric Multi-Source Modelling Based Determined Blind Source Separation

Abstract:This paper proposes a determined blind source separation method using Bayesian non-parametric modelling of sources. Conventionally source signals are separated from a given set of mixture signals by modelling them using non-negative matrix factorization (NMF). However in NMF, a latent variable signifying model complexity must be appropriately specified to avoid over-fitting or under-fitting. As real-world sources can be of varying and unknown complexities, we propose a Bayesian non-parametric framework which is invariant to such latent variables. We show that our proposed method adapts to different source complexities, while conventional methods require parameter tuning for optimal separation.

* 5 pages, 2 figures. Accepted at ICASSP 2019

Via

Access Paper or Ask Questions