Abstract:Human motion generation has paramount importance in computer animation. It is a challenging generative temporal modelling task due to the vast possibilities of human motion, high human sensitivity to motion coherence and the difficulty of accurately generating fine-grained motions. Recently, diffusion methods have been proposed for human motion generation due to their high sample quality and expressiveness. However, generated sequences still suffer from motion incoherence, and are limited to short duration, and simpler motion and take considerable time during inference. To address these limitations, we propose \textit{RecMoDiffuse: Recurrent Flow Diffusion}, a new recurrent diffusion formulation for temporal modelling. Unlike previous work, which applies diffusion to the whole sequence without any temporal dependency, an approach that inherently makes temporal consistency hard to achieve. Our method explicitly enforces temporal constraints with the means of normalizing flow models in the diffusion process and thereby extends diffusion to the temporal dimension. We demonstrate the effectiveness of RecMoDiffuse in the temporal modelling of human motion. Our experiments show that RecMoDiffuse achieves comparable results with state-of-the-art methods while generating coherent motion sequences and reducing the computational overhead in the inference stage.
Abstract:We propose DynamicSurf, a model-free neural implicit surface reconstruction method for high-fidelity 3D modelling of non-rigid surfaces from monocular RGB-D video. To cope with the lack of multi-view cues in monocular sequences of deforming surfaces, one of the most challenging settings for 3D reconstruction, DynamicSurf exploits depth, surface normals, and RGB losses to improve reconstruction fidelity and optimisation time. DynamicSurf learns a neural deformation field that maps a canonical representation of the surface geometry to the current frame. We depart from current neural non-rigid surface reconstruction models by designing the canonical representation as a learned feature grid which leads to faster and more accurate surface reconstruction than competing approaches that use a single MLP. We demonstrate DynamicSurf on public datasets and show that it can optimize sequences of varying frames with $6\times$ speedup over pure MLP-based approaches while achieving comparable results to the state-of-the-art methods. Project is available at https://mirgahney.github.io//DynamicSurf.io/.
Abstract:We propose Geometric Neural Parametric Models (GNPM), a learned parametric model that takes into account the local structure of data to learn disentangled shape and pose latent spaces of 4D dynamics, using a geometric-aware architecture on point clouds. Temporally consistent 3D deformations are estimated without the need for dense correspondences at training time, by exploiting cycle consistency. Besides its ability to learn dense correspondences, GNPMs also enable latent-space manipulations such as interpolation and shape/pose transfer. We evaluate GNPMs on various datasets of clothed humans, and show that it achieves comparable performance to state-of-the-art methods that require dense correspondences during training.
Abstract:Thanks to their improved data efficiency, equivariant neural networks have gained increased interest in the deep learning community. They have been successfully applied in the medical domain where symmetries in the data can be effectively exploited to build more accurate and robust models. To be able to reach a much larger body of patients, mobile, on-device implementations of deep learning solutions have been developed for medical applications. However, equivariant models are commonly implemented using large and computationally expensive architectures, not suitable to run on mobile devices. In this work, we design and test an equivariant version of MobileNetV2 and further optimize it with model quantization to enable more efficient inference. We achieve close-to state of the art performance on the Patch Camelyon (PCam) medical dataset while being more computationally efficient.