Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Pujin Shi

Audio-Guided Fusion Techniques for Multimodal Emotion Analysis

Sep 08, 2024

Pujin Shi, Fei Gao

Figure 1 for Audio-Guided Fusion Techniques for Multimodal Emotion Analysis

Figure 2 for Audio-Guided Fusion Techniques for Multimodal Emotion Analysis

Figure 3 for Audio-Guided Fusion Techniques for Multimodal Emotion Analysis

Figure 4 for Audio-Guided Fusion Techniques for Multimodal Emotion Analysis

Abstract:In this paper, we propose a solution for the semi-supervised learning track (MER-SEMI) in MER2024. First, in order to enhance the performance of the feature extractor on sentiment classification tasks,we fine-tuned video and text feature extractors, specifically CLIP-vit-large and Baichuan-13B, using labeled data. This approach effectively preserves the original emotional information conveyed in the videos. Second, we propose an Audio-Guided Transformer (AGT) fusion mechanism, which leverages the robustness of Hubert-large, showing superior effectiveness in fusing both inter-channel and intra-channel information. Third, To enhance the accuracy of the model, we iteratively apply self-supervised learning by using high-confidence unlabeled data as pseudo-labels. Finally, through black-box probing, we discovered an imbalanced data distribution between the training and test sets. Therefore, We adopt a prior-knowledge-based voting mechanism. The results demonstrate the effectiveness of our strategy, ultimately earning us third place in the MER-SEMI track.

Via

Access Paper or Ask Questions