Abstract:Autism diagnosis presents a major challenge due to the vast heterogeneity of the condition and the elusive nature of early detection. Atypical gait and gesture patterns are dominant behavioral characteristics of autism and can provide crucial insights for diagnosis. Furthermore, these data can be collected efficiently in a non-intrusive way, facilitating early intervention to optimize positive outcomes. Existing research mainly focuses on associating facial and eye-gaze features with autism. However, very few studies have investigated movement and gesture patterns which can reveal subtle variations and characteristics that are specific to autism. To address this gap, we present an analysis of gesture and gait activity in videos to identify children with autism and quantify the severity of their condition by regressing autism diagnostic observation schedule scores. Our proposed architecture addresses two key factors: (1) an effective feature representation to manifest irregular gesture patterns and (2) a two-stream co-learning framework to enable a comprehensive understanding of its relation to autism from diverse perspectives without explicitly using additional data modality. Experimental results demonstrate the efficacy of utilizing gesture and gait-activity videos for autism analysis.
Abstract:Athlete performance measurement in sports videos requires modeling long sequences since the entire spatio-temporal progression contributes dominantly to the performance. It is crucial to comprehend local discriminative spatial dependencies and global semantics for accurate evaluation. However, existing benchmark datasets mainly incorporate sports where the performance lasts only a few seconds. Consequently, state-ofthe-art sports quality assessment methods specifically focus on spatial structure. Although they achieve high performance in short-term sports, they are unable to model prolonged video sequences and fail to achieve similar performance in long-term sports. To facilitate such analysis, we introduce a new dataset, coined AGF-Olympics, that incorporates artistic gymnastic floor routines. AFG-Olympics provides highly challenging scenarios with extensive background, viewpoint, and scale variations over an extended sample duration of up to 2 minutes. In addition, we propose a discriminative attention module to map the dense feature space into a sparse representation by disentangling complex associations. Extensive experiments indicate that our proposed module provides an effective way to embed long-range spatial and temporal correlation semantics.