Abstract:Older people are susceptible to fall due to instability in posture and deteriorating health. Immediate access to medical support can greatly reduce repercussions. Hence, there is an increasing interest in automated fall detection, often incorporated into a smart healthcare system to provide better monitoring. Existing systems focus on wearable devices which are inconvenient or video monitoring which has privacy concerns. Moreover, these systems provide a limited perspective of their generalization ability as they are tested on datasets containing few activities that have wide disparity in the action space and are easy to differentiate. Complex daily life scenarios pose much greater challenges with activities that overlap in action spaces due to similar posture or motion. To overcome these limitations, we propose a fall detection model, coined SDFA, based on human skeletons extracted from low-resolution videos. The use of skeleton data ensures privacy and low-resolution videos ensures low hardware and computational cost. Our model captures discriminative structural displacements and motion trends using unified joint and motion features projected onto a shared high dimensional space. Particularly, the use of separable convolution combined with a powerful GCN architecture provides improved performance. Extensive experiments on five large-scale datasets with a wide range of evaluation settings show that our model achieves competitive performance with extremely low computational complexity and runs faster than existing models.
Abstract:The increasing pace of population aging calls for better care and support systems. Falling is a frequent and critical problem for elderly people causing serious long-term health issues. Fall detection from video streams is not an attractive option for real-life applications due to privacy issues. Existing methods try to resolve this issue by using very low-resolution cameras or video encryption. However, privacy cannot be ensured completely with such approaches. Key points on the body, such as skeleton joints, can convey significant information about motion dynamics and successive posture changes which are crucial for fall detection. Skeleton joints have been explored for feature extraction but with image recognition models that ignore joint dependency across frames which is important for the classification of actions. Moreover, existing models are over-parameterized or evaluated on small datasets with very few activity classes. We propose an efficient graph convolution network model that exploits spatio-temporal joint dependencies and dynamics of human skeleton joints for accurate fall detection. Our method leverages dynamic representation with robust concurrent spatio-temporal characteristics of skeleton joints. We performed extensive experiments on three large-scale datasets. With a significantly smaller model size than most existing methods, our proposed method achieves state-of-the-art results on the large scale NTU datasets.
Abstract:Autism diagnosis presents a major challenge due to the vast heterogeneity of the condition and the elusive nature of early detection. Atypical gait and gesture patterns are dominant behavioral characteristics of autism and can provide crucial insights for diagnosis. Furthermore, these data can be collected efficiently in a non-intrusive way, facilitating early intervention to optimize positive outcomes. Existing research mainly focuses on associating facial and eye-gaze features with autism. However, very few studies have investigated movement and gesture patterns which can reveal subtle variations and characteristics that are specific to autism. To address this gap, we present an analysis of gesture and gait activity in videos to identify children with autism and quantify the severity of their condition by regressing autism diagnostic observation schedule scores. Our proposed architecture addresses two key factors: (1) an effective feature representation to manifest irregular gesture patterns and (2) a two-stream co-learning framework to enable a comprehensive understanding of its relation to autism from diverse perspectives without explicitly using additional data modality. Experimental results demonstrate the efficacy of utilizing gesture and gait-activity videos for autism analysis.
Abstract:Athlete performance measurement in sports videos requires modeling long sequences since the entire spatio-temporal progression contributes dominantly to the performance. It is crucial to comprehend local discriminative spatial dependencies and global semantics for accurate evaluation. However, existing benchmark datasets mainly incorporate sports where the performance lasts only a few seconds. Consequently, state-ofthe-art sports quality assessment methods specifically focus on spatial structure. Although they achieve high performance in short-term sports, they are unable to model prolonged video sequences and fail to achieve similar performance in long-term sports. To facilitate such analysis, we introduce a new dataset, coined AGF-Olympics, that incorporates artistic gymnastic floor routines. AFG-Olympics provides highly challenging scenarios with extensive background, viewpoint, and scale variations over an extended sample duration of up to 2 minutes. In addition, we propose a discriminative attention module to map the dense feature space into a sparse representation by disentangling complex associations. Extensive experiments indicate that our proposed module provides an effective way to embed long-range spatial and temporal correlation semantics.