Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Huihao Huang

Unimodal-driven Distillation in Multimodal Emotion Recognition with Dynamic Fusion

Mar 31, 2025

Jiagen Li, Rui Yu, Huihao Huang, Huaicheng Yan

Figure 1 for Unimodal-driven Distillation in Multimodal Emotion Recognition with Dynamic Fusion

Figure 2 for Unimodal-driven Distillation in Multimodal Emotion Recognition with Dynamic Fusion

Figure 3 for Unimodal-driven Distillation in Multimodal Emotion Recognition with Dynamic Fusion

Figure 4 for Unimodal-driven Distillation in Multimodal Emotion Recognition with Dynamic Fusion

Abstract:Multimodal Emotion Recognition in Conversations (MERC) identifies emotional states across text, audio and video, which is essential for intelligent dialogue systems and opinion analysis. Existing methods emphasize heterogeneous modal fusion directly for cross-modal integration, but often suffer from disorientation in multimodal learning due to modal heterogeneity and lack of instructive guidance. In this work, we propose SUMMER, a novel heterogeneous multimodal integration framework leveraging Mixture of Experts with Hierarchical Cross-modal Fusion and Interactive Knowledge Distillation. Key components include a Sparse Dynamic Mixture of Experts (SDMoE) for capturing dynamic token-wise interactions, a Hierarchical Cross-Modal Fusion (HCMF) for effective fusion of heterogeneous modalities, and Interactive Knowledge Distillation (IKD), which uses a pre-trained unimodal teacher to guide multimodal fusion in latent and logit spaces. Experiments on IEMOCAP and MELD show SUMMER outperforms state-of-the-art methods, particularly in recognizing minority and semantically similar emotions.

Via

Access Paper or Ask Questions