Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Jeong Hoon Lim

A Multimodal Fusion Model Leveraging MLP Mixer and Handcrafted Features-based Deep Learning Networks for Facial Palsy Detection

Mar 13, 2025

Heng Yim Nicole Oo, Min Hun Lee, Jeong Hoon Lim

Figure 1 for A Multimodal Fusion Model Leveraging MLP Mixer and Handcrafted Features-based Deep Learning Networks for Facial Palsy Detection

Figure 2 for A Multimodal Fusion Model Leveraging MLP Mixer and Handcrafted Features-based Deep Learning Networks for Facial Palsy Detection

Figure 3 for A Multimodal Fusion Model Leveraging MLP Mixer and Handcrafted Features-based Deep Learning Networks for Facial Palsy Detection

Abstract:Algorithmic detection of facial palsy offers the potential to improve current practices, which usually involve labor-intensive and subjective assessments by clinicians. In this paper, we present a multimodal fusion-based deep learning model that utilizes an MLP mixer-based model to process unstructured data (i.e. RGB images or images with facial line segments) and a feed-forward neural network to process structured data (i.e. facial landmark coordinates, features of facial expressions, or handcrafted features) for detecting facial palsy. We then contribute to a study to analyze the effect of different data modalities and the benefits of a multimodal fusion-based approach using videos of 20 facial palsy patients and 20 healthy subjects. Our multimodal fusion model achieved 96.00 F1, which is significantly higher than the feed-forward neural network trained on handcrafted features alone (82.80 F1) and an MLP mixer-based model trained on raw RGB images (89.00 F1).

* PAKDD 2025. arXiv admin note: text overlap with arXiv:2405.16496

Via

Access Paper or Ask Questions

Exploring a Multimodal Fusion-based Deep Learning Network for Detecting Facial Palsy

May 26, 2024

Nicole Heng Yim Oo, Min Hun Lee, Jeong Hoon Lim

Abstract:Algorithmic detection of facial palsy offers the potential to improve current practices, which usually involve labor-intensive and subjective assessment by clinicians. In this paper, we present a multimodal fusion-based deep learning model that utilizes unstructured data (i.e. an image frame with facial line segments) and structured data (i.e. features of facial expressions) to detect facial palsy. We then contribute to a study to analyze the effect of different data modalities and the benefits of a multimodal fusion-based approach using videos of 21 facial palsy patients. Our experimental results show that among various data modalities (i.e. unstructured data - RGB images and images of facial line segments and structured data - coordinates of facial landmarks and features of facial expressions), the feed-forward neural network using features of facial expression achieved the highest precision of 76.22 while the ResNet-based model using images of facial line segments achieved the highest recall of 83.47. When we leveraged both images of facial line segments and features of facial expressions, our multimodal fusion-based deep learning model slightly improved the precision score to 77.05 at the expense of a decrease in the recall score.

Via

Access Paper or Ask Questions