Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Yulong Wan

Outlier-aware Inlier Modeling and Multi-scale Scoring for Anomalous Sound Detection via Multitask Learning

Sep 14, 2023

Yucong Zhang, Hongbin Suo, Yulong Wan, Ming Li

Abstract:This paper proposes an approach for anomalous sound detection that incorporates outlier exposure and inlier modeling within a unified framework by multitask learning. While outlier exposure-based methods can extract features efficiently, it is not robust. Inlier modeling is good at generating robust features, but the features are not very effective. Recently, serial approaches are proposed to combine these two methods, but it still requires a separate training step for normal data modeling. To overcome these limitations, we use multitask learning to train a conformer-based encoder for outlier-aware inlier modeling. Moreover, our approach provides multi-scale scores for detecting anomalies. Experimental results on the MIMII and DCASE 2020 task 2 datasets show that our approach outperforms state-of-the-art single-model systems and achieves comparable results with top-ranked multi-system ensembles.

* accepted at INTERSPEECH 2023

Via

Access Paper or Ask Questions

Task-Agnostic Structured Pruning of Speech Representation Models

Jun 02, 2023

Haoyu Wang, Siyuan Wang, Wei-Qiang Zhang, Hongbin Suo, Yulong Wan

Abstract:Self-supervised pre-trained models such as Wav2vec2, Hubert, and WavLM have been shown to significantly improve many speech tasks. However, their large memory and strong computational requirements hinder their industrial applicability. Structured pruning is a hardware-friendly model compression technique but usually results in a larger loss of accuracy. In this paper, we propose a fine-grained attention head pruning method to compensate for the performance degradation. In addition, we also introduce the straight through estimator into the L0 regularization to further accelerate the pruned model. Experiments on the SUPERB benchmark show that our model can achieve comparable performance to the dense model in multiple tasks and outperforms the Wav2vec 2.0 base model on average, with 72% fewer parameters and 2 times faster inference speed.

* Accepted by INTERSPEECH 2023

Via

Access Paper or Ask Questions

Multilingual Zero Resource Speech Recognition Base on Self-Supervise Pre-Trained Acoustic Models

Oct 13, 2022

Haoyu Wang, Wei-Qiang Zhang, Hongbin Suo, Yulong Wan

Figure 1 for Multilingual Zero Resource Speech Recognition Base on Self-Supervise Pre-Trained Acoustic Models

Figure 2 for Multilingual Zero Resource Speech Recognition Base on Self-Supervise Pre-Trained Acoustic Models

Figure 3 for Multilingual Zero Resource Speech Recognition Base on Self-Supervise Pre-Trained Acoustic Models

Figure 4 for Multilingual Zero Resource Speech Recognition Base on Self-Supervise Pre-Trained Acoustic Models

Abstract:Labeled audio data is insufficient to build satisfying speech recognition systems for most of the languages in the world. There have been some zero-resource methods trying to perform phoneme or word-level speech recognition without labeled audio data of the target language, but the error rate of these methods is usually too high to be applied in real-world scenarios. Recently, the representation ability of self-supervise pre-trained models has been found to be extremely beneficial in zero-resource phoneme recognition. As far as we are concerned, this paper is the first attempt to extend the use of pre-trained models into word-level zero-resource speech recognition. This is done by fine-tuning the pre-trained models on IPA phoneme transcriptions and decoding with a language model trained on extra texts. Experiments on Wav2vec 2.0 and HuBERT models show that this method can achieve less than 20% word error rate on some languages, and the average error rate on 8 languages is 33.77%.

* accepted by ISCSLP 2022

Via

Access Paper or Ask Questions