Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Shijie Sun

Beyond Skeletons: Integrative Latent Mapping for Coherent 4D Sequence Generation

Mar 20, 2024

Qitong Yang, Mingtao Feng, Zijie Wu, Shijie Sun, Weisheng Dong, Yaonan Wang, Ajmal Mian

Figure 1 for Beyond Skeletons: Integrative Latent Mapping for Coherent 4D Sequence Generation

Figure 2 for Beyond Skeletons: Integrative Latent Mapping for Coherent 4D Sequence Generation

Figure 3 for Beyond Skeletons: Integrative Latent Mapping for Coherent 4D Sequence Generation

Figure 4 for Beyond Skeletons: Integrative Latent Mapping for Coherent 4D Sequence Generation

Abstract:Directly learning to model 4D content, including shape, color and motion, is challenging. Existing methods depend on skeleton-based motion control and offer limited continuity in detail. To address this, we propose a novel framework that generates coherent 4D sequences with animation of 3D shapes under given conditions with dynamic evolution of shape and color over time through integrative latent mapping. We first employ an integrative latent unified representation to encode shape and color information of each detailed 3D geometry frame. The proposed skeleton-free latent 4D sequence joint representation allows us to leverage diffusion models in a low-dimensional space to control the generation of 4D sequences. Finally, temporally coherent 4D sequences are generated conforming well to the input images and text prompts. Extensive experiments on the ShapeNet, 3DBiCar and DeformingThings4D datasets for several tasks demonstrate that our method effectively learns to generate quality 3D shapes with color and 4D mesh animations, improving over the current state-of-the-art. Source code will be released.

Via

Access Paper or Ask Questions

Simultaneous Multiple Object Detection and Pose Estimation using 3D Model Infusion with Monocular Vision

Nov 22, 2022

Congliang Li, Shijie Sun, Xiangyu Song, Huansheng Song, Naveed Akhtar, Ajmal Saeed Mian

Abstract:Multiple object detection and pose estimation are vital computer vision tasks. The latter relates to the former as a downstream problem in applications such as robotics and autonomous driving. However, due to the high complexity of both tasks, existing methods generally treat them independently, which is sub-optimal. We propose simultaneous neural modeling of both using monocular vision and 3D model infusion. Our Simultaneous Multiple Object detection and Pose Estimation network (SMOPE-Net) is an end-to-end trainable multitasking network with a composite loss that also provides the advantages of anchor-free detections for efficient downstream pose estimation. To enable the annotation of training data for our learning objective, we develop a Twin-Space object labeling method and demonstrate its correctness analytically and empirically. Using the labeling method, we provide the KITTI-6DoF dataset with $\sim7.5$K annotated frames. Extensive experiments on KITTI-6DoF and the popular LineMod datasets show a consistent performance gain with SMOPE-Net over existing pose estimation methods. Here are links to our proposed SMOPE-Net, KITTI-6DoF dataset, and LabelImg3D labeling tool.

Via

Access Paper or Ask Questions