Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Nima Ghorbani

MPI for Intelligent Systems

SOMA: Solving Optical Marker-Based MoCap Automatically

Oct 09, 2021

Nima Ghorbani, Michael J. Black

Figure 1 for SOMA: Solving Optical Marker-Based MoCap Automatically

Figure 2 for SOMA: Solving Optical Marker-Based MoCap Automatically

Figure 3 for SOMA: Solving Optical Marker-Based MoCap Automatically

Figure 4 for SOMA: Solving Optical Marker-Based MoCap Automatically

Abstract:Marker-based optical motion capture (mocap) is the "gold standard" method for acquiring accurate 3D human motion in computer vision, medicine, and graphics. The raw output of these systems are noisy and incomplete 3D points or short tracklets of points. To be useful, one must associate these points with corresponding markers on the captured subject; i.e. "labelling". Given these labels, one can then "solve" for the 3D skeleton or body surface mesh. Commercial auto-labeling tools require a specific calibration procedure at capture time, which is not possible for archival data. Here we train a novel neural network called SOMA, which takes raw mocap point clouds with varying numbers of points, labels them at scale without any calibration data, independent of the capture technology, and requiring only minimal human intervention. Our key insight is that, while labeling point clouds is highly ambiguous, the 3D body provides strong constraints on the solution that can be exploited by a learning-based method. To enable learning, we generate massive training sets of simulated noisy and ground truth mocap markers animated by 3D bodies from AMASS. SOMA exploits an architecture with stacked self-attention elements to learn the spatial structure of the 3D body and an optimal transport layer to constrain the assignment (labeling) problem while rejecting outliers. We extensively evaluate SOMA both quantitatively and qualitatively. SOMA is more accurate and robust than existing state of the art research methods and can be applied where commercial systems cannot. We automatically label over 8 hours of archival mocap data across 4 different datasets captured using various technologies and output SMPL-X body models. The model and data is released for research purposes at https://soma.is.tue.mpg.de/.

* ICCV 2021

Via

Access Paper or Ask Questions

hSMAL: Detailed Horse Shape and Pose Reconstruction for Motion Pattern Recognition

Jun 18, 2021

Ci Li, Nima Ghorbani, Sofia Broomé, Maheen Rashid, Michael J. Black, Elin Hernlund, Hedvig Kjellström, Silvia Zuffi

Figure 1 for hSMAL: Detailed Horse Shape and Pose Reconstruction for Motion Pattern Recognition

Figure 2 for hSMAL: Detailed Horse Shape and Pose Reconstruction for Motion Pattern Recognition

Figure 3 for hSMAL: Detailed Horse Shape and Pose Reconstruction for Motion Pattern Recognition

Figure 4 for hSMAL: Detailed Horse Shape and Pose Reconstruction for Motion Pattern Recognition

Abstract:In this paper we present our preliminary work on model-based behavioral analysis of horse motion. Our approach is based on the SMAL model, a 3D articulated statistical model of animal shape. We define a novel SMAL model for horses based on a new template, skeleton and shape space learned from $37$ horse toys. We test the accuracy of our hSMAL model in reconstructing a horse from 3D mocap data and images. We apply the hSMAL model to the problem of lameness detection from video, where we fit the model to images to recover 3D pose and train an ST-GCN network on pose data. A comparison with the same network trained on mocap points illustrates the benefit of our approach.

* CV4Animals Workshop in CVPR 2021

Via

Access Paper or Ask Questions

GRAB: A Dataset of Whole-Body Human Grasping of Objects

Aug 25, 2020

Omid Taheri, Nima Ghorbani, Michael J. Black, Dimitrios Tzionas

Figure 1 for GRAB: A Dataset of Whole-Body Human Grasping of Objects

Figure 2 for GRAB: A Dataset of Whole-Body Human Grasping of Objects

Figure 3 for GRAB: A Dataset of Whole-Body Human Grasping of Objects

Figure 4 for GRAB: A Dataset of Whole-Body Human Grasping of Objects

Abstract:Training computers to understand, model, and synthesize human grasping requires a rich dataset containing complex 3D object shapes, detailed contact information, hand pose and shape, and the 3D body motion over time. While "grasping" is commonly thought of as a single hand stably lifting an object, we capture the motion of the entire body and adopt the generalized notion of "whole-body grasps". Thus, we collect a new dataset, called GRAB (GRasping Actions with Bodies), of whole-body grasps, containing full 3D shape and pose sequences of 10 subjects interacting with 51 everyday objects of varying shape and size. Given MoCap markers, we fit the full 3D body shape and pose, including the articulated face and hands, as well as the 3D object pose. This gives detailed 3D meshes over time, from which we compute contact between the body and object. This is a unique dataset, that goes well beyond existing ones for modeling and understanding how humans grasp and manipulate objects, how their full body is involved, and how interaction varies with the task. We illustrate the practical value of GRAB with an example application; we train GrabNet, a conditional generative network, to predict 3D hand grasps for unseen 3D object shapes. The dataset and code are available for research purposes at https://grab.is.tue.mpg.de.

* ECCV 2020

Via

Access Paper or Ask Questions

Expressive Body Capture: 3D Hands, Face, and Body from a Single Image

Apr 11, 2019

Georgios Pavlakos, Vasileios Choutas, Nima Ghorbani, Timo Bolkart, Ahmed A. A. Osman, Dimitrios Tzionas, Michael J. Black

Figure 1 for Expressive Body Capture: 3D Hands, Face, and Body from a Single Image

Figure 2 for Expressive Body Capture: 3D Hands, Face, and Body from a Single Image

Figure 3 for Expressive Body Capture: 3D Hands, Face, and Body from a Single Image

Figure 4 for Expressive Body Capture: 3D Hands, Face, and Body from a Single Image

Abstract:To facilitate the analysis of human actions, interactions and emotions, we compute a 3D model of human body pose, hand pose, and facial expression from a single monocular image. To achieve this, we use thousands of 3D scans to train a new, unified, 3D model of the human body, SMPL-X, that extends SMPL with fully articulated hands and an expressive face. Learning to regress the parameters of SMPL-X directly from images is challenging without paired images and 3D ground truth. Consequently, we follow the approach of SMPLify, which estimates 2D features and then optimizes model parameters to fit the features. We improve on SMPLify in several significant ways: (1) we detect 2D features corresponding to the face, hands, and feet and fit the full SMPL-X model to these; (2) we train a new neural network pose prior using a large MoCap dataset; (3) we define a new interpenetration penalty that is both fast and accurate; (4) we automatically detect gender and the appropriate body models (male, female, or neutral); (5) our PyTorch implementation achieves a speedup of more than 8x over Chumpy. We use the new method, SMPLify-X, to fit SMPL-X to both controlled images and images in the wild. We evaluate 3D accuracy on a new curated dataset comprising 100 images with pseudo ground-truth. This is a step towards automatic expressive human capture from monocular RGB data. The models, code, and data are available for research purposes at https://smpl-x.is.tue.mpg.de.

* To appear in CVPR 2019

Via

Access Paper or Ask Questions

AMASS: Archive of Motion Capture as Surface Shapes

Apr 05, 2019

Naureen Mahmood, Nima Ghorbani, Nikolaus F. Troje, Gerard Pons-Moll, Michael J. Black

Figure 1 for AMASS: Archive of Motion Capture as Surface Shapes

Figure 2 for AMASS: Archive of Motion Capture as Surface Shapes

Figure 3 for AMASS: Archive of Motion Capture as Surface Shapes

Figure 4 for AMASS: Archive of Motion Capture as Surface Shapes

Abstract:Large datasets are the cornerstone of recent advances in computer vision using deep learning. In contrast, existing human motion capture (mocap) datasets are small and the motions limited, hampering progress on learning models of human motion. While there are many different datasets available, they each use a different parameterization of the body, making it difficult to integrate them into a single meta dataset. To address this, we introduce AMASS, a large and varied database of human motion that unifies 15 different optical marker-based mocap datasets by representing them within a common framework and parameterization. We achieve this using a new method, MoSh++, that converts mocap data into realistic 3D human meshes represented by a rigged body model; here we use SMPL [doi:10.1145/2816795.2818013], which is widely used and provides a standard skeletal representation as well as a fully rigged surface mesh. The method works for arbitrary marker sets, while recovering soft-tissue dynamics and realistic hand motion. We evaluate MoSh++ and tune its hyperparameters using a new dataset of 4D body scans that are jointly recorded with marker-based mocap. The consistent representation of AMASS makes it readily useful for animation, visualization, and generating training data for deep learning. Our dataset is significantly richer than previous human motion collections, having more than 40 hours of motion data, spanning over 300 subjects, more than 11,000 motions, and will be publicly available to the research community.

* 12 pages, 9 figures, 1 table

Via

Access Paper or Ask Questions