Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Andrey V. Savchenko

HSEmotion Team at ABAW-8 Competition: Audiovisual Ambivalence/Hesitancy, Emotional Mimicry Intensity and Facial Expression Recognition

Mar 13, 2025

Andrey V. Savchenko

Abstract:This article presents our results for the eighth Affective Behavior Analysis in-the-Wild (ABAW) competition. We combine facial emotional descriptors extracted by pre-trained models, namely, our EmotiEffLib library, with acoustic features and embeddings of texts recognized from speech. The frame-level features are aggregated and fed into simple classifiers, e.g., multi-layered perceptron (feed-forward neural network with one hidden layer), to predict ambivalence/hesitancy and facial expressions. In the latter case, we also use the pre-trained facial expression recognition model to select high-score video frames and prevent their processing with a domain-specific video classifier. The video-level prediction of emotional mimicry intensity is implemented by simply aggregating frame-level features and training a multi-layered perceptron. Experimental results for three tasks from the ABAW challenge demonstrate that our approach significantly increases validation metrics compared to existing baselines.

* submitted to ABAW CVPR 2025 Workshop

Via

Access Paper or Ask Questions

Neural Click Models for Recommender Systems

Sep 30, 2024

Mikhail Shirokikh, Ilya Shenbin, Anton Alekseev, Anna Volodkevich, Alexey Vasilev, Andrey V. Savchenko, Sergey Nikolenko

Abstract:We develop and evaluate neural architectures to model the user behavior in recommender systems (RS) inspired by click models for Web search but going beyond standard click models. Proposed architectures include recurrent networks, Transformer-based models that alleviate the quadratic complexity of self-attention, adversarial and hierarchical architectures. Our models outperform baselines on the ContentWise and RL4RS datasets and can be used in RS simulators to model user response for RS evaluation and pretraining.

Via

Access Paper or Ask Questions

HSEmotion Team at the 7th ABAW Challenge: Multi-Task Learning and Compound Facial Expression Recognition

Jul 18, 2024

Andrey V. Savchenko

Figure 1 for HSEmotion Team at the 7th ABAW Challenge: Multi-Task Learning and Compound Facial Expression Recognition

Figure 2 for HSEmotion Team at the 7th ABAW Challenge: Multi-Task Learning and Compound Facial Expression Recognition

Figure 3 for HSEmotion Team at the 7th ABAW Challenge: Multi-Task Learning and Compound Facial Expression Recognition

Figure 4 for HSEmotion Team at the 7th ABAW Challenge: Multi-Task Learning and Compound Facial Expression Recognition

Abstract:In this paper, we describe the results of the HSEmotion team in two tasks of the seventh Affective Behavior Analysis in-the-wild (ABAW) competition, namely, multi-task learning for simultaneous prediction of facial expression, valence, arousal, and detection of action units, and compound expression recognition. We propose an efficient pipeline based on frame-level facial feature extractors pre-trained in multi-task settings to estimate valence-arousal and basic facial expressions given a facial photo. We ensure the privacy-awareness of our techniques by using the lightweight architectures of neural networks, such as MT-EmotiDDAMFN, MT-EmotiEffNet, and MT-EmotiMobileFaceNet, that can run even on a mobile device without the need to send facial video to a remote server. It was demonstrated that a significant step in improving the overall accuracy is the smoothing of neural network output scores using Gaussian or box filters. It was experimentally demonstrated that such a simple post-processing of predictions from simple blending of two top visual models improves the F1-score of facial expression recognition up to 7%. At the same time, the mean Concordance Correlation Coefficient (CCC) of valence and arousal is increased by up to 1.25 times compared to each model's frame-level predictions. As a result, our final performance score on the validation set from the multi-task learning challenge is 4.5 times higher than the baseline (1.494 vs 0.32).

* 16 pages, 7 figures, 9 tables

Via

Access Paper or Ask Questions

HSEmotion Team at the 6th ABAW Competition: Facial Expressions, Valence-Arousal and Emotion Intensity Prediction

Mar 18, 2024

Andrey V. Savchenko

Abstract:This article presents our results for the sixth Affective Behavior Analysis in-the-wild (ABAW) competition. To improve the trustworthiness of facial analysis, we study the possibility of using pre-trained deep models that extract reliable emotional features without the need to fine-tune the neural networks for a downstream task. In particular, we introduce several lightweight models based on MobileViT, MobileFaceNet, EfficientNet, and DDAMFN architectures trained in multi-task scenarios to recognize facial expressions, valence, and arousal on static photos. These neural networks extract frame-level features fed into a simple classifier, e.g., linear feed-forward neural network, to predict emotion intensity, compound expressions, action units, facial expressions, and valence/arousal. Experimental results for five tasks from the sixth ABAW challenge demonstrate that our approach lets us significantly improve quality metrics on validation sets compared to existing non-ensemble techniques.

* 10 pages, 1 figure, 8 tables

Via

Access Paper or Ask Questions

EmotiEffNet Facial Features in Uni-task Emotion Recognition in Video at ABAW-5 competition

Mar 16, 2023

Andrey V. Savchenko

Figure 1 for EmotiEffNet Facial Features in Uni-task Emotion Recognition in Video at ABAW-5 competition

Figure 2 for EmotiEffNet Facial Features in Uni-task Emotion Recognition in Video at ABAW-5 competition

Figure 3 for EmotiEffNet Facial Features in Uni-task Emotion Recognition in Video at ABAW-5 competition

Figure 4 for EmotiEffNet Facial Features in Uni-task Emotion Recognition in Video at ABAW-5 competition

Abstract:In this article, the results of our team for the fifth Affective Behavior Analysis in-the-wild (ABAW) competition are presented. The usage of the pre-trained convolutional networks from the EmotiEffNet family for frame-level feature extraction is studied. In particular, we propose an ensemble of a multi-layered perceptron and the LightAutoML-based classifier. The post-processing by smoothing the results for sequential frames is implemented. Experimental results for the large-scale Aff-Wild2 database demonstrate that our model achieves a much greater macro-averaged F1-score for facial expression recognition and action unit detection and concordance correlation coefficients for valence/arousal estimation when compared to baseline.

* 7 pages; 5 figures; 3 tables

Via

Access Paper or Ask Questions

HSE-NN Team at the 4th ABAW Competition: Multi-task Emotion Recognition and Learning from Synthetic Images

Jul 21, 2022

Andrey V. Savchenko

Figure 1 for HSE-NN Team at the 4th ABAW Competition: Multi-task Emotion Recognition and Learning from Synthetic Images

Figure 2 for HSE-NN Team at the 4th ABAW Competition: Multi-task Emotion Recognition and Learning from Synthetic Images

Figure 3 for HSE-NN Team at the 4th ABAW Competition: Multi-task Emotion Recognition and Learning from Synthetic Images

Figure 4 for HSE-NN Team at the 4th ABAW Competition: Multi-task Emotion Recognition and Learning from Synthetic Images

Abstract:In this paper, we present the results of the HSE-NN team in the 4th competition on Affective Behavior Analysis in-the-wild (ABAW). The novel multi-task EfficientNet model is trained for simultaneous recognition of facial expressions and prediction of valence and arousal on static photos. The resulting MT-EmotiEffNet extracts visual features that are fed into simple feed-forward neural networks in the multi-task learning challenge. We obtain performance measure 1.3 on the validation set, which is significantly greater when compared to either performance of baseline (0.3) or existing models that are trained only on the s-Aff-Wild2 database. In the learning from synthetic data challenge, the quality of the original synthetic training set is increased by using the super-resolution techniques, such as Real-ESRGAN. Next, the MT-EmotiEffNet is fine-tuned on the new training set. The final prediction is a simple blending ensemble of pre-trained and fine-tuned MT-EmotiEffNets. Our average validation F1 score is 18% greater than the baseline convolutional neural network.

* 13 pages, 3 figures, 8 tables

Via

Access Paper or Ask Questions

Frame-level Prediction of Facial Expressions, Valence, Arousal and Action Units for Mobile Devices

Mar 25, 2022

Andrey V. Savchenko

Figure 1 for Frame-level Prediction of Facial Expressions, Valence, Arousal and Action Units for Mobile Devices

Figure 2 for Frame-level Prediction of Facial Expressions, Valence, Arousal and Action Units for Mobile Devices

Figure 3 for Frame-level Prediction of Facial Expressions, Valence, Arousal and Action Units for Mobile Devices

Figure 4 for Frame-level Prediction of Facial Expressions, Valence, Arousal and Action Units for Mobile Devices

Abstract:In this paper, we consider the problem of real-time video-based facial emotion analytics, namely, facial expression recognition, prediction of valence and arousal and detection of action unit points. We propose the novel frame-level emotion recognition algorithm by extracting facial features with the single EfficientNet model pre-trained on AffectNet. As a result, our approach may be implemented even for video analytics on mobile devices. Experimental results for the large scale Aff-Wild2 database from the third Affective Behavior Analysis in-the-wild (ABAW) Competition demonstrate that our simple model is significantly better when compared to the VggFace baseline. In particular, our method is characterized by 0.15-0.2 higher performance measures for validation sets in uni-task Expression Classification, Valence-Arousal Estimation and Expression Classification. Due to simplicity, our approach may be considered as a new baseline for all four sub-challenges.

* 6 pages, 2 figures, 5 tables

Via

Access Paper or Ask Questions

Facial expression and attributes recognition based on multi-task learning of lightweight neural networks

Mar 31, 2021

Andrey V. Savchenko

Figure 1 for Facial expression and attributes recognition based on multi-task learning of lightweight neural networks

Figure 2 for Facial expression and attributes recognition based on multi-task learning of lightweight neural networks

Figure 3 for Facial expression and attributes recognition based on multi-task learning of lightweight neural networks

Figure 4 for Facial expression and attributes recognition based on multi-task learning of lightweight neural networks

Abstract:In this paper, we examine the multi-task training of lightweight convolutional neural networks for face identification and classification of facial attributes (age, gender, ethnicity) trained on cropped faces without margins. It is shown that it is still necessary to fine-tune these networks in order to predict facial expressions. Several models are presented based on MobileNet, EfficientNet and RexNet architectures. It was experimentally demonstrated that our models are characterized by the state-of-the-art emotion classification accuracy on AffectNet dataset and near state-of-the-art results in age, gender and race recognition for UTKFace dataset. Moreover, it is shown that the usage of our neural network as a feature extractor of facial regions in video frames and concatenation of several statistical functions (mean, max, etc.) leads to 4.5\% higher accuracy than the previously known state-of-the-art single models for AFEW and VGAF datasets from the EmotiW challenges. The models and source code are publicly available at https://github.com/HSE-asavchenko/face-emotion-recognition.

* 13 pages, 2 figures

Via

Access Paper or Ask Questions

Event Recognition with Automatic Album Detection based on Sequential Processing, Neural Attention and Image Captioning

Jan 15, 2020

Andrey V. Savchenko

Figure 1 for Event Recognition with Automatic Album Detection based on Sequential Processing, Neural Attention and Image Captioning

Figure 2 for Event Recognition with Automatic Album Detection based on Sequential Processing, Neural Attention and Image Captioning

Figure 3 for Event Recognition with Automatic Album Detection based on Sequential Processing, Neural Attention and Image Captioning

Figure 4 for Event Recognition with Automatic Album Detection based on Sequential Processing, Neural Attention and Image Captioning

Abstract:In this paper a new formulation of event recognition task is examined: it is required to predict event categories in a gallery of images, for which albums (groups of photos corresponding to a single event) are unknown. We propose the novel two-stage approach. At first, features are extracted in each photo using the pre-trained convolutional neural network. These features are classified individually. The scores of the classifier are used to group sequential photos into several clusters. Finally, the features of photos in each group are aggregated into a single descriptor using neural attention mechanism. This algorithm is optionally extended to improve the accuracy for classification of each image in an album. In contrast to conventional fine-tuning of convolutional neural networks (CNN) we proposed to use image captioning, i.e., generative model that converts images to textual descriptions. They are one-hot encoded and summarized into sparse feature vector suitable for learning of arbitrary classifier. Experimental study with Photo Event Collection and Multi-Label Curation of Flickr Events Dataset demonstrates that our approach is 9-20% more accurate than event recognition on single photos. Moreover, proposed method has 13-16% lower error rate than classification of groups of photos obtained with hierarchical clustering. It is experimentally shown that the image captions trained on Conceptual Captions dataset can be classified more accurately than the features from object detector, though they both are obviously not as rich as the CNN-based features. However, it is possible to combine our approach with conventional CNNs in an ensemble to provide the state-of-the-art results for several event datasets.

* 11 pages, 5 figures

Via

Access Paper or Ask Questions

Compression of Recurrent Neural Networks for Efficient Language Modeling

Feb 06, 2019

Artem M. Grachev, Dmitry I. Ignatov, Andrey V. Savchenko

Figure 1 for Compression of Recurrent Neural Networks for Efficient Language Modeling

Figure 2 for Compression of Recurrent Neural Networks for Efficient Language Modeling

Figure 3 for Compression of Recurrent Neural Networks for Efficient Language Modeling

Figure 4 for Compression of Recurrent Neural Networks for Efficient Language Modeling

Abstract:Recurrent neural networks have proved to be an effective method for statistical language modeling. However, in practice their memory and run-time complexity are usually too large to be implemented in real-time offline mobile applications. In this paper we consider several compression techniques for recurrent neural networks including Long-Short Term Memory models. We make particular attention to the high-dimensional output problem caused by the very large vocabulary size. We focus on effective compression methods in the context of their exploitation on devices: pruning, quantization, and matrix decomposition approaches (low-rank factorization and tensor train decomposition, in particular). For each model we investigate the trade-off between its size, suitability for fast inference and perplexity. We propose a general pipeline for applying the most suitable methods to compress recurrent neural networks for language modeling. It has been shown in the experimental study with the Penn Treebank (PTB) dataset that the most efficient results in terms of speed and compression-perplexity balance are obtained by matrix decomposition techniques.

* 25 pages, 3 tables, 4 figures

Via

Access Paper or Ask Questions