Abstract:Young children are at an increased risk of contracting contagious diseases such as COVID-19 due to improper hand hygiene. An autonomous social agent that observes children while handwashing and encourages good hand washing practices could provide an opportunity for handwashing behavior to become a habit. In this article, we present a human action recognition system, which is part of the vision system of a social robot platform, to assist children in developing a correct handwashing technique. A modified convolution neural network (CNN) architecture with Channel Spatial Attention Bilinear Pooling (CSAB) frame, with a VGG-16 architecture as the backbone is trained and validated on an augmented dataset. The modified architecture generalizes well with an accuracy of 90% for the WHO-prescribed handwashing steps even in an unseen environment. Our findings indicate that the approach can recognize even subtle hand movements in the video and can be used for gesture detection and classification in social robotics.