Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Liviu Cristian Dutu

Temporal aggregation of audio-visual modalities for emotion recognition

Jul 08, 2020

Andreea Birhala, Catalin Nicolae Ristea, Anamaria Radoi, Liviu Cristian Dutu

Figure 1 for Temporal aggregation of audio-visual modalities for emotion recognition

Figure 2 for Temporal aggregation of audio-visual modalities for emotion recognition

Figure 3 for Temporal aggregation of audio-visual modalities for emotion recognition

Figure 4 for Temporal aggregation of audio-visual modalities for emotion recognition

Abstract:Emotion recognition has a pivotal role in affective computing and in human-computer interaction. The current technological developments lead to increased possibilities of collecting data about the emotional state of a person. In general, human perception regarding the emotion transmitted by a subject is based on vocal and visual information collected in the first seconds of interaction with the subject. As a consequence, the integration of verbal (i.e., speech) and non-verbal (i.e., image) information seems to be the preferred choice in most of the current approaches towards emotion recognition. In this paper, we propose a multimodal fusion technique for emotion recognition based on combining audio-visual modalities from a temporal window with different temporal offsets for each modality. We show that our proposed method outperforms other methods from the literature and human accuracy rating. The experiments are conducted over the open-access multimodal dataset CREMA-D.

Via

Access Paper or Ask Questions

Emotion Recognition System from Speech and Visual Information based on Convolutional Neural Networks

Feb 29, 2020

Nicolae-Catalin Ristea, Liviu Cristian Dutu, Anamaria Radoi

Figure 1 for Emotion Recognition System from Speech and Visual Information based on Convolutional Neural Networks

Figure 2 for Emotion Recognition System from Speech and Visual Information based on Convolutional Neural Networks

Figure 3 for Emotion Recognition System from Speech and Visual Information based on Convolutional Neural Networks

Figure 4 for Emotion Recognition System from Speech and Visual Information based on Convolutional Neural Networks

Abstract:Emotion recognition has become an important field of research in the human-computer interactions domain. The latest advancements in the field show that combining visual with audio information lead to better results if compared to the case of using a single source of information separately. From a visual point of view, a human emotion can be recognized by analyzing the facial expression of the person. More precisely, the human emotion can be described through a combination of several Facial Action Units. In this paper, we propose a system that is able to recognize emotions with a high accuracy rate and in real time, based on deep Convolutional Neural Networks. In order to increase the accuracy of the recognition system, we analyze also the speech data and fuse the information coming from both sources, i.e., visual and audio. Experimental results show the effectiveness of the proposed scheme for emotion recognition and the importance of combining visual with audio data.

Via

Access Paper or Ask Questions