Abstract:In this article, we describe Conditioned Localizer and Classifier (CoLoC) which is a novel solution for Sound Event Localization and Detection (SELD). The solution constitutes of two stages: the localization is done first and is followed by classification conditioned by the output of the localizer. In order to resolve the problem of the unknown number of sources we incorporate the idea borrowed from Sequential Set Generation (SSG). Models from both stages are SELDnet-like CRNNs, but with single outputs. Conducted reasoning shows that such two single-output models are fit for SELD task. We show that our solution improves on the baseline system in most metrics on the STARSS22 Dataset.
Abstract:In this paper, we introduce ID-Conditioned Auto-Encoder for unsupervised anomaly detection. Our method is an adaptation of the Class-Conditioned Auto-Encoder (C2AE) designed for the open-set recognition. Assuming that non-anomalous samples constitute of distinct IDs, we apply Conditioned Auto-Encoder with labels provided by these IDs. Opposed to C2AE, our approach omits the classification subtask and reduces the learning process to the single run. We simplify the learning process further by fixing a constant vector as the target for non-matching labels. We apply our method in the context of sounds for machine condition monitoring. We evaluate our method on the ToyADMOS and MIMII datasets from the DCASE 2020 Challenge Task 2. We conduct an ablation study to indicate which steps of our method influences results the most.
Abstract:In this paper, we describe our method for DCASE2019 task3: Sound Event Localization and Detection (SELD). We use four CRNN SELDnet-like single output models which run in a consecutive manner to recover all possible information of occurring events. We decompose the SELD task into estimating number of active sources, estimating direction of arrival of a single source, estimating direction of arrival of the second source where the direction of the first one is known and a multi-label classification task. We use custom consecutive ensemble to predict events' onset, offset, direction of arrival and class. The proposed approach is evaluated on the TAU Spatial Sound Events 2019 - Ambisonic and it is compared with other participants' submissions.