Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Topic:3d Human Pose Estimation

What is 3d Human Pose Estimation? 3D Human Pose Estimation is a computer vision task that involves estimating the 3D positions and orientations of body joints and bones from 2D images or videos. The goal is to reconstruct the 3D pose of a person in real time, which can be used in a variety of applications, such as virtual reality, human-computer interaction, and motion analysis.

Real-Time Dynamic Robot-Assisted Hand-Object Interaction via Motion Primitives

May 29, 2024

Mingqi Yuan, Huijiang Wang, Kai-Fung Chu, Fumiya Iida, Bo Li, Wenjun Zeng

Abstract:Advances in artificial intelligence (AI) have been propelling the evolution of human-robot interaction (HRI) technologies. However, significant challenges remain in achieving seamless interactions, particularly in tasks requiring physical contact with humans. These challenges arise from the need for accurate real-time perception of human actions, adaptive control algorithms for robots, and the effective coordination between human and robotic movements. In this paper, we propose an approach to enhancing physical HRI with a focus on dynamic robot-assisted hand-object interaction (HOI). Our methodology integrates hand pose estimation, adaptive robot control, and motion primitives to facilitate human-robot collaboration. Specifically, we employ a transformer-based algorithm to perform real-time 3D modeling of human hands from single RGB images, based on which a motion primitives model (MPM) is designed to translate human hand motions into robotic actions. The robot's action implementation is dynamically fine-tuned using the continuously updated 3D hand models. Experimental validations, including a ring-wearing task, demonstrate the system's effectiveness in adapting to real-time movements and assisting in precise task executions.

* 8 pages, 10 figures

Via

Access Paper or Ask Questions

WeightedPose: Generalizable Cross-Pose Estimation via Weighted SVD

May 03, 2024

Xuxin Cheng, Heng Yu, Harry Zhang, Wenxing Deng

Figure 1 for WeightedPose: Generalizable Cross-Pose Estimation via Weighted SVD

Figure 2 for WeightedPose: Generalizable Cross-Pose Estimation via Weighted SVD

Figure 3 for WeightedPose: Generalizable Cross-Pose Estimation via Weighted SVD

Abstract:We present a novel method for robotic manipulation tasks in human environments that require reasoning about the 3D geometric relationship between a pair of objects. Traditional end-to-end trained policies, which map from pixel observations to low-level robot actions, struggle to reason about complex pose relationships and have difficulty generalizing to unseen object configurations. To address these challenges, we propose a method that learns to reason about the 3D geometric relationship between objects, focusing on the relationship between key parts on one object with respect to key parts on another object. Our standalone model utilizes Weighted SVD to reason about both pose relationships between articulated parts and between free-floating objects. This approach allows the robot to understand the relationship between the oven door and the oven body, as well as the relationship between the lasagna plate and the oven, for example. By considering the 3D geometric relationship between objects, our method enables robots to perform complex manipulation tasks that reason about object-centric representations. We open source the code and demonstrate the results here

* arXiv admin note: text overlap with arXiv:2211.09325

Via

Access Paper or Ask Questions

Mushroom Segmentation and 3D Pose Estimation from Point Clouds using Fully Convolutional Geometric Features and Implicit Pose Encoding

Apr 17, 2024

George Retsinas, Niki Efthymiou, Petros Maragos

Abstract:Modern agricultural applications rely more and more on deep learning solutions. However, training well-performing deep networks requires a large amount of annotated data that may not be available and in the case of 3D annotation may not even be feasible for human annotators. In this work, we develop a deep learning approach to segment mushrooms and estimate their pose on 3D data, in the form of point clouds acquired by depth sensors. To circumvent the annotation problem, we create a synthetic dataset of mushroom scenes, where we are fully aware of 3D information, such as the pose of each mushroom. The proposed network has a fully convolutional backbone, that parses sparse 3D data, and predicts pose information that implicitly defines both instance segmentation and pose estimation task. We have validated the effectiveness of the proposed implicit-based approach for a synthetic test set, as well as provided qualitative results for a small set of real acquired point clouds with depth sensors. Code is publicly available at https://github.com/georgeretsi/mushroom-pose.

Via

Access Paper or Ask Questions

SkelFormer: Markerless 3D Pose and Shape Estimation using Skeletal Transformers

Apr 19, 2024

Vandad Davoodnia, Saeed Ghorbani, Alexandre Messier, Ali Etemad

Abstract:We introduce SkelFormer, a novel markerless motion capture pipeline for multi-view human pose and shape estimation. Our method first uses off-the-shelf 2D keypoint estimators, pre-trained on large-scale in-the-wild data, to obtain 3D joint positions. Next, we design a regression-based inverse-kinematic skeletal transformer that maps the joint positions to pose and shape representations from heavily noisy observations. This module integrates prior knowledge about pose space and infers the full pose state at runtime. Separating the 3D keypoint detection and inverse-kinematic problems, along with the expressive representations learned by our skeletal transformer, enhance the generalization of our method to unseen noisy data. We evaluate our method on three public datasets in both in-distribution and out-of-distribution settings using three datasets, and observe strong performance with respect to prior works. Moreover, ablation experiments demonstrate the impact of each of the modules of our architecture. Finally, we study the performance of our method in dealing with noise and heavy occlusions and find considerable robustness with respect to other solutions.

* 12 pages, 8 figures

Via

Access Paper or Ask Questions

HPE-CogVLM: New Head Pose Grounding Task Exploration on Vision Language Model

Jun 04, 2024

Yu Tian, Tianqi Shao, Tsukasa Demizu, Xuyang Wu, Hsin-Tai Wu

Abstract:Head pose estimation (HPE) task requires a sophisticated understanding of 3D spatial relationships and precise numerical output of yaw, pitch, and roll Euler angles. Previous HPE studies are mainly based on Non-large language models (Non-LLMs), which rely on close-up human heads cropped from the full image as inputs and lack robustness in real-world scenario. In this paper, we present a novel framework to enhance the HPE prediction task by leveraging the visual grounding capability of CogVLM. CogVLM is a vision language model (VLM) with grounding capability of predicting object bounding boxes (BBoxes), which enables HPE training and prediction using full image information input. To integrate the HPE task into the VLM, we first cop with the catastrophic forgetting problem in large language models (LLMs) by investigating the rehearsal ratio in the data rehearsal method. Then, we propose and validate a LoRA layer-based model merging method, which keeps the integrity of parameters, to enhance the HPE performance in the framework. The results show our HPE-CogVLM achieves a 31.5\% reduction in Mean Absolute Error for HPE prediction over the current Non-LLM based state-of-the-art in cross-dataset evaluation. Furthermore, we compare our LoRA layer-based model merging method with LoRA fine-tuning only and other merging methods in CogVLM. The results demonstrate our framework outperforms them in all HPE metrics.

Via

Access Paper or Ask Questions

An iterative closest point algorithm for marker-free 3D shape registration of continuum robots

May 24, 2024

Matthias K. Hoffmann, Julian Mühlenhoff, Zhaoheng Ding, Thomas Sattel, Kathrin Flaßkamp

Abstract:Continuum robots have emerged as a promising technology in the medical field due to their potential of accessing deep sited locations of the human body with low surgical trauma. When deriving physics-based models for these robots, evaluating the models poses a significant challenge due to the difficulty in accurately measuring their intricate shapes. In this work, we present an optimization based 3D shape registration algorithm for estimation of the backbone shape of slender continuum robots as part of a pho togrammetric measurement. Our approach to estimating the backbones optimally matches a parametric three-dimensional curve to images of the robot. Since we incorporate an iterative closest point algorithm into our method, we do not need prior knowledge of the robots position within the respective images. In our experiments with artificial and real images of a concentric tube continuum robot, we found an average maximum deviation of the reconstruction from simulation data of 0.665 mm and 0.939 mm from manual measurements. These results show that our algorithm is well capable of producing high accuracy positional data from images of continuum robots.

* 11 pages, 8 figures, 2 algorithms, journal

Via

Access Paper or Ask Questions

Spot-Compose: A Framework for Open-Vocabulary Object Retrieval and Drawer Manipulation in Point Clouds

Apr 18, 2024

Oliver Lemke, Zuria Bauer, René Zurbrügg, Marc Pollefeys, Francis Engelmann, Hermann Blum

Abstract:In recent years, modern techniques in deep learning and large-scale datasets have led to impressive progress in 3D instance segmentation, grasp pose estimation, and robotics. This allows for accurate detection directly in 3D scenes, object- and environment-aware grasp prediction, as well as robust and repeatable robotic manipulation. This work aims to integrate these recent methods into a comprehensive framework for robotic interaction and manipulation in human-centric environments. Specifically, we leverage 3D reconstructions from a commodity 3D scanner for open-vocabulary instance segmentation, alongside grasp pose estimation, to demonstrate dynamic picking of objects, and opening of drawers. We show the performance and robustness of our model in two sets of real-world experiments including dynamic object retrieval and drawer opening, reporting a 51% and 82% success rate respectively. Code of our framework as well as videos are available on: https://spot-compose.github.io/.

* Accepted at ICRA 2024 Workshops. Code and videos available at https://spot-compose.github.io/

Via

Access Paper or Ask Questions

Toon3D: Seeing Cartoons from a New Perspective

May 17, 2024

Ethan Weber, Riley Peterlinz, Rohan Mathur, Frederik Warburg, Alexei A. Efros, Angjoo Kanazawa

Abstract:In this work, we recover the underlying 3D structure of non-geometrically consistent scenes. We focus our analysis on hand-drawn images from cartoons and anime. Many cartoons are created by artists without a 3D rendering engine, which means that any new image of a scene is hand-drawn. The hand-drawn images are usually faithful representations of the world, but only in a qualitative sense, since it is difficult for humans to draw multiple perspectives of an object or scene 3D consistently. Nevertheless, people can easily perceive 3D scenes from inconsistent inputs! In this work, we correct for 2D drawing inconsistencies to recover a plausible 3D structure such that the newly warped drawings are consistent with each other. Our pipeline consists of a user-friendly annotation tool, camera pose estimation, and image deformation to recover a dense structure. Our method warps images to obey a perspective camera model, enabling our aligned results to be plugged into novel-view synthesis reconstruction methods to experience cartoons from viewpoints never drawn before. Our project page is https://toon3d.studio .

* Please see our project page: https://toon3d.studio

Via

Access Paper or Ask Questions

LPSNet: End-to-End Human Pose and Shape Estimation with Lensless Imaging

Apr 08, 2024

Haoyang Ge, Qiao Feng, Hailong Jia, Xiongzheng Li, Xiangjun Yin, You Zhou, Jingyu Yang, Kun Li

Figure 1 for LPSNet: End-to-End Human Pose and Shape Estimation with Lensless Imaging

Figure 2 for LPSNet: End-to-End Human Pose and Shape Estimation with Lensless Imaging

Figure 3 for LPSNet: End-to-End Human Pose and Shape Estimation with Lensless Imaging

Figure 4 for LPSNet: End-to-End Human Pose and Shape Estimation with Lensless Imaging

Abstract:Human pose and shape (HPS) estimation with lensless imaging is not only beneficial to privacy protection but also can be used in covert surveillance scenarios due to the small size and simple structure of this device. However, this task presents significant challenges due to the inherent ambiguity of the captured measurements and lacks effective methods for directly estimating human pose and shape from lensless data. In this paper, we propose the first end-to-end framework to recover 3D human poses and shapes from lensless measurements to our knowledge. We specifically design a multi-scale lensless feature decoder to decode the lensless measurements through the optically encoded mask for efficient feature extraction. We also propose a double-head auxiliary supervision mechanism to improve the estimation accuracy of human limb ends. Besides, we establish a lensless imaging system and verify the effectiveness of our method on various datasets acquired by our lensless imaging system.

* Accepted to CVPR 2024. More results available at https://cic.tju.edu.cn/faculty/likun/projects/LPSNet

Via

Access Paper or Ask Questions

3D WholeBody Pose Estimation based on Semantic Graph Attention Network and Distance Information

Jun 03, 2024

Sihan Wen, Xiantan Zhu, Zhiming Tan

Abstract:In recent years, a plethora of diverse methods have been proposed for 3D pose estimation. Among these, self-attention mechanisms and graph convolutions have both been proven to be effective and practical methods. Recognizing the strengths of those two techniques, we have developed a novel Semantic Graph Attention Network which can benefit from the ability of self-attention to capture global context, while also utilizing the graph convolutions to handle the local connectivity and structural constraints of the skeleton. We also design a Body Part Decoder that assists in extracting and refining the information related to specific segments of the body. Furthermore, our approach incorporates Distance Information, enhancing our model's capability to comprehend and accurately predict spatial relationships. Finally, we introduce a Geometry Loss who makes a critical constraint on the structural skeleton of the body, ensuring that the model's predictions adhere to the natural limits of human posture. The experimental results validate the effectiveness of our approach, demonstrating that every element within the system is essential for improving pose estimation outcomes. With comparison to state-of-the-art, the proposed work not only meets but exceeds the existing benchmarks.

Via

Access Paper or Ask Questions

Topic:3d Human Pose Estimation

Papers and Code