Abstract:Sign language is a visual language used by the deaf and dumb community to communicate. However, for most recognition methods based on monocular cameras, the recognition accuracy is low and the robustness is poor. Even if the effect is good on some data, it may perform poorly in other data with different interference due to the inability to extract effective features. To solve these problems, we propose a sign language recognition network that integrates skeleton features of hands and facial expression. Among this, we propose a hand skeleton feature extraction based on coordinate transformation to describe the shape of the hand more accurately. Moreover, by incorporating facial expression information, the accuracy and robustness of sign language recognition are finally improved, which was verified on A Dataset for Argentinian Sign Language and SEU's Chinese Sign Language Recognition Database (SEUCSLRD).
Abstract:Enormous hand images with reliable annotations are collected through marker-based MoCap. Unfortunately, degradations caused by markers limit their application in hand appearance reconstruction. A clear appearance recovery insight is an image-to-image translation trained with unpaired data. However, most frameworks fail because there exists structure inconsistency from a degraded hand to a bare one. The core of our approach is to first disentangle the bare hand structure from those degraded images and then wrap the appearance to this structure with a dual adversarial discrimination (DAD) scheme. Both modules take full advantage of the semi-supervised learning paradigm: The structure disentanglement benefits from the modeling ability of ViT, and the translator is enhanced by the dual discrimination on both translation processes and translation results. Comprehensive evaluations have been conducted to prove that our framework can robustly recover photo-realistic hand appearance from diverse marker-contained and even object-occluded datasets. It provides a novel avenue to acquire bare hand appearance data for other downstream learning problems.The codes will be publicly available at https://www.yangangwang.com