Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Yangchen Yu

DAT: Dialogue-Aware Transformer with Modality-Group Fusion for Human Engagement Estimation

Oct 11, 2024

Jia Li, Yangchen Yu, Yin Chen, Yu Zhang, Peng Jia, Yunbo Xu, Ziqiang Li, Meng Wang, Richang Hong

Figure 1 for DAT: Dialogue-Aware Transformer with Modality-Group Fusion for Human Engagement Estimation

Figure 2 for DAT: Dialogue-Aware Transformer with Modality-Group Fusion for Human Engagement Estimation

Figure 3 for DAT: Dialogue-Aware Transformer with Modality-Group Fusion for Human Engagement Estimation

Figure 4 for DAT: Dialogue-Aware Transformer with Modality-Group Fusion for Human Engagement Estimation

Abstract:Engagement estimation plays a crucial role in understanding human social behaviors, attracting increasing research interests in fields such as affective computing and human-computer interaction. In this paper, we propose a Dialogue-Aware Transformer framework (DAT) with Modality-Group Fusion (MGF), which relies solely on audio-visual input and is language-independent, for estimating human engagement in conversations. Specifically, our method employs a modality-group fusion strategy that independently fuses audio and visual features within each modality for each person before inferring the entire audio-visual content. This strategy significantly enhances the model's performance and robustness. Additionally, to better estimate the target participant's engagement levels, the introduced Dialogue-Aware Transformer considers both the participant's behavior and cues from their conversational partners. Our method was rigorously tested in the Multi-Domain Engagement Estimation Challenge held by MultiMediate'24, demonstrating notable improvements in engagement-level regression precision over the baseline model. Notably, our approach achieves a CCC score of 0.76 on the NoXi Base test set and an average CCC of 0.64 across the NoXi Base, NoXi-Add, and MPIIGI test sets.

* 1st Place on the NoXi Base dataset in the Multi-Domain Engagement Estimation Challenge held by MultiMediate 24, accepted by ACM Multimedia 2024. The source code is available at \url{https://github.com/MSA-LMC/DAT}

Via

Access Paper or Ask Questions

Multimodal Feature Extraction and Fusion for Emotional Reaction Intensity Estimation and Expression Classification in Videos with Transformers

Mar 16, 2023

Jia Li, Yin Chen, Xuesong Zhang, Jiantao Nie, Yangchen Yu, Ziqiang Li, Meng Wang, Richang Hong

Abstract:In this paper, we present our solutions to the two sub-challenges of Affective Behavior Analysis in the wild (ABAW) 2023: the Emotional Reaction Intensity (ERI) Estimation Challenge and Expression (Expr) Classification Challenge. ABAW 2023 focuses on the problem of affective behavior analysis in the wild, with the goal of creating machines and robots that have the ability to understand human feelings, emotions and behaviors, which can effectively contribute to the advent of a more intelligent future. In our work, we use different models and tools for the Hume-Reaction dataset to extract features of various aspects, such as audio features, video features, etc. By analyzing, combining, and studying these multimodal features, we effectively improve the accuracy of the model for multimodal sentiment prediction. For the Emotional Reaction Intensity (ERI) Estimation Challenge, our method shows excellent results with a Pearson coefficient on the validation dataset, exceeding the baseline method by 84 percent.

* Solutions of HFUT-CVers Team at the 5th ABAW Competition (CVPR 2023 workshop)

Via

Access Paper or Ask Questions