Abstract:RF sensors have been recently proposed as a new modality for sign language processing technology. They are non-contact, effective in the dark, and acquire a direct measurement of signing kinematic via exploitation of the micro-Doppler effect. First, this work provides an in depth, comparative examination of the kinematic properties of signing as measured by RF sensors for both fluent ASL users and hearing imitation signers. Second, as ASL recognition techniques utilizing deep learning requires a large amount of training data, this work examines the effect of signing kinematics and subject fluency on adversarial learning techniques for data synthesis. Two different approaches for the synthetic training data generation are proposed: 1) adversarial domain adaptation to minimize the differences between imitation signing and fluent signing data, and 2) kinematically-constrained generative adversarial networks for accurate synthesis of RF signing signatures. The results show that the kinematic discrepancies between imitation signing and fluent signing are so significant that training on data directly synthesized from fluent RF signers offers greater performance (93% top-5 accuracy) than that produced by adaptation of imitation signing (88% top-5 accuracy) when classifying 100 ASL signs.
Abstract:Differentiating multivariate dynamic signals is a difficult learning problem as the feature space may be large yet often only a few training examples are available. Traditional approaches to this problem either proceed from handcrafted features or require large datasets to combat the m >> n problem. In this paper, we show that the source of the problem---signal dynamics---can be used to our advantage and noticeably improve classification performance on a range of discrimination tasks when training data is scarce. We demonstrate that self-supervised pre-training guided by signal dynamics produces embedding that generalizes across tasks, datasets, data collection sites, and data distributions. We perform an extensive evaluation of this approach on a range of tasks including simulated data, keyword detection problem, and a range of functional neuroimaging data, where we show that a single embedding learnt on healthy subjects generalizes across a number of disorders, age groups, and datasets.