Abstract:Various neural network based methods are capable of anticipating human body motions from data for a short period of time. What these methods lack are the interpretability and explainability of the network and its results. We propose to use Dynamic Mode Decomposition with delays to represent and anticipate human body motions. Exploring the influence of the number of delays on the reconstruction and prediction of various motion classes, we show that the anticipation errors in our results are comparable or even better for very short anticipation times ($<0.4$ sec) to a recurrent neural network based method. We perceive our method as a first step towards the interpretability of the results by representing human body motions as linear combinations of ``factors''. In addition, compared to the neural network based methods large training times are not needed. Actually, our methods do not even regress to any other motions than the one to be anticipated and hence is of a generic nature.
Abstract:The key prerequisite for accessing the huge potential of current machine learning techniques is the availability of large databases that capture the complex relations of interest. Previous datasets are focused on either 3D scene representations with semantic information, tracking of multiple persons and recognition of their actions, or activity recognition of a single person in captured 3D environments. We present Bonn Activity Maps, a large-scale dataset for human tracking, activity recognition and anticipation of multiple persons. Our dataset comprises four different scenes that have been recorded by time-synchronized cameras each only capturing the scene partially, the reconstructed 3D models with semantic annotations, motion trajectories for individual people including 3D human poses as well as human activity annotations. We utilize the annotations to generate activity likelihoods on the 3D models called activity maps.