Abstract:The availability of real data from areas with high privacy requirements, such as the medical intervention space, is low and the acquisition legally complex. Therefore, this work presents a way to create a synthetic dataset for the medical context, using medical clothing as an example. The goal is to close the reality gap between the synthetic and real data. For this purpose, methods of 3D-scanned clothing and designed clothing are compared in a Domain-Randomization and Structured-Domain-Randomization scenario using an Unreal-Engine plugin or Unity. Additionally a Mixed-Reality dataset in front of a greenscreen and a target domain dataset were used. Our experiments show, that Structured-Domain-Randomization of designed clothing together with Mixed-Reality data provide a baseline achieving 72.0% mAP on a test dataset of the clinical target domain. When additionally using 15% of available target domain train data, the gap towards 100% (660 images) target domain train data could be nearly closed 80.05% mAP (81.95% mAP). Finally we show that when additionally using 100% target domain train data the accuracy could be increased to 83.35% mAP.
Abstract:In this paper we introduce EfficientPose, a new approach for 6D object pose estimation. Our method is highly accurate, efficient and scalable over a wide range of computational resources. Moreover, it can detect the 2D bounding box of multiple objects and instances as well as estimate their full 6D poses in a single shot. This eliminates the significant increase in runtime when dealing with multiple objects other approaches suffer from. These approaches aim to first detect 2D targets, e.g. keypoints, and solve a Perspective-n-Point problem for their 6D pose for each object afterwards. We also propose a novel augmentation method for direct 6D pose estimation approaches to improve performance and generalization, called 6D augmentation. Our approach achieves a new state-of-theart accuracy of 97.35% in terms of the ADD(-S) metric on the widely-used 6D pose estimation benchmark dataset Linemod using RGB input, while still running end-to-end at over 27 FPS. Through the inherent handling of multiple objects and instances and the fused single shot 2D object detection as well as 6D pose estimation, our approach runs even with multiple objects (eight) end-to-end at over 26 FPS, making it highly attractive to many real world scenarios. Code will be made publicly available at https://github.com/ybkscht/EfficientPose.