Abstract:We propose a new approach to Human Action Evaluation (HAE) in long videos using graph-based multi-task modeling. Previous works in activity assessment either directly compute a metric using a detected skeleton or use the scene information to regress the activity score. These approaches are insufficient for accurate activity assessment since they only compute an average score over a clip, and do not consider the correlation between the joints and body dynamics. Moreover, they are highly scene-dependent which makes the generalizability of these methods questionable. We propose a novel multi-task framework for HAE that utilizes a Graph Convolutional Network backbone to embed the interconnection between human joints in the features. In this framework, we solve the Human Action Detection (HAD) problem as an auxiliary task to improve activity assessment. The HAD head is powered by an Encoder-Decoder Temporal Convolutional Network to detect activities in long videos and HAE uses a Long-Short-Term-Memory-based architecture. We evaluate our method on the UW-IOM and TUM Kitchen datasets and discuss the success and failure cases on these two datasets.
Abstract:Recognition of human actions and associated interactions with objects and the environment is an important problem in computer vision due to its potential applications in a variety of domains. The most versatile methods can generalize to various environments and deal with cluttered backgrounds, occlusions, and viewpoint variations. Among them, methods based on graph convolutional networks that extract features from the skeleton have demonstrated promising performance. In this paper, we propose a novel Spatio-Temporal Pyramid Graph Convolutional Network (ST-PGN) for online action recognition for ergonomic risk assessment that enables the use of features from all levels of the skeleton feature hierarchy. The proposed algorithm outperforms state-of-art action recognition algorithms tested on two public benchmark datasets typically used for postural assessment (TUM and UW-IOM). We also introduce a pipeline to enhance postural assessment methods with online action recognition techniques. Finally, the proposed algorithm is integrated with a traditional ergonomic risk index (REBA) to demonstrate the potential value for assessment of musculoskeletal disorders in occupational safety.
Abstract:Automated real-time prediction of the ergonomic risks of manipulating objects is a key unsolved challenge in developing effective human-robot collaboration systems for logistics and manufacturing applications. We present a foundational paradigm to address this challenge by formulating the problem as one of action segmentation from RGB-D camera videos. Spatial features are first learned using a deep convolutional model from the video frames, which are then fed sequentially to temporal convolutional networks to semantically segment the frames into a hierarchy of actions, which are either ergonomically safe, require monitoring, or need immediate attention. For performance evaluation, in addition to an open-source kitchen dataset, we collected a new dataset comprising twenty individuals picking up and placing objects of varying weights to and from cabinet and table locations at various heights. Results show very high (87-94)% F1 overlap scores among the ground truth and predicted frame labels for videos lasting over two minutes and comprising a large number of actions.
Abstract:One of the challenges in model-based control of stochastic dynamical systems is that the state transition dynamics are involved, and it is not easy or efficient to make good-quality predictions of the states. Moreover, there are not many representational models for the majority of autonomous systems, as it is not easy to build a compact model that captures the entire dynamical subtleties and uncertainties. In this work, we present a hierarchical Bayesian linear regression model with local features to learn the dynamics of a micro-robotic system as well as two simpler examples, consisting of a stochastic mass-spring damper and a stochastic double inverted pendulum on a cart. The model is hierarchical since we assume non-stationary priors for the model parameters. These non-stationary priors make the model more flexible by imposing priors on the priors of the model. To solve the maximum likelihood (ML) problem for this hierarchical model, we use the variational expectation maximization (EM) algorithm, and enhance the procedure by introducing hidden target variables. The algorithm yields parsimonious model structures, and consistently provides fast and accurate predictions for all our examples involving large training and test sets. This demonstrates the effectiveness of the method in learning stochastic dynamics, which makes it suitable for future use in a paradigm, such as model-based reinforcement learning, to compute optimal control policies in real time.