Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:CT-SAT: Contextual Transformer for Sequential Audio Tagging

Mar 22, 2022

Yuanbo Hou, Zhaoyi Liu, Bo Kang, Yun Wang, Dick Botteldooren

Figure 1 for CT-SAT: Contextual Transformer for Sequential Audio Tagging

Figure 2 for CT-SAT: Contextual Transformer for Sequential Audio Tagging

Figure 3 for CT-SAT: Contextual Transformer for Sequential Audio Tagging

Figure 4 for CT-SAT: Contextual Transformer for Sequential Audio Tagging

Share this with someone who'll enjoy it:

Abstract:Sequential audio event tagging can provide not only the type information of audio events, but also the order information between events and the number of events that occur in an audio clip. Most previous works on audio event sequence analysis rely on connectionist temporal classification (CTC). However, CTC's conditional independence assumption prevents it from effectively learning correlations between diverse audio events. This paper first attempts to introduce Transformer into sequential audio tagging, since Transformers perform well in sequence-related tasks. To better utilize contextual information of audio event sequences, we draw on the idea of bidirectional recurrent neural networks, and propose a contextual Transformer (cTransformer) with a bidirectional decoder that could exploit the forward and backward information of event sequences. Experiments on the real-life polyphonic audio dataset show that, compared to CTC-based methods, the cTransformer can effectively combine the fine-grained acoustic representations from the encoder and coarse-grained audio event cues to exploit contextual information to successfully recognize and predict audio event sequences.

* Submitted to interspeech 2022

View paper on

Share this with someone who'll enjoy it:

Title:CT-SAT: Contextual Transformer for Sequential Audio Tagging

Paper and Code