Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Marcelo Sandoval-Castaneda

Self-Supervised Video Transformers for Isolated Sign Language Recognition

Sep 02, 2023

Marcelo Sandoval-Castaneda, Yanhong Li, Diane Brentari, Karen Livescu, Gregory Shakhnarovich

Figure 1 for Self-Supervised Video Transformers for Isolated Sign Language Recognition

Figure 2 for Self-Supervised Video Transformers for Isolated Sign Language Recognition

Figure 3 for Self-Supervised Video Transformers for Isolated Sign Language Recognition

Figure 4 for Self-Supervised Video Transformers for Isolated Sign Language Recognition

Abstract:This paper presents an in-depth analysis of various self-supervision methods for isolated sign language recognition (ISLR). We consider four recently introduced transformer-based approaches to self-supervised learning from videos, and four pre-training data regimes, and study all the combinations on the WLASL2000 dataset. Our findings reveal that MaskFeat achieves performance superior to pose-based and supervised video models, with a top-1 accuracy of 79.02% on gloss-based WLASL2000. Furthermore, we analyze these models' ability to produce representations of ASL signs using linear probing on diverse phonological features. This study underscores the value of architecture and pre-training task choices in ISLR. Specifically, our results on WLASL2000 highlight the power of masked reconstruction pre-training, and our linear probing results demonstrate the importance of hierarchical vision transformers for sign language representation.

* 14 pages. Submitted to WACV 2024

Via

Access Paper or Ask Questions