Abstract:This paper introduces a 3D shape completion approach using a 3D latent diffusion model optimized for completing shapes, represented as Truncated Signed Distance Functions (TSDFs), from partial 3D scans. Our method combines image-based conditioning through cross-attention and spatial conditioning through the integration of 3D features from captured partial scans. This dual guidance enables high-fidelity, realistic shape completions at superior resolutions. At the core of our approach is the compression of 3D data into a low-dimensional latent space using an auto-encoder inspired by 2D latent diffusion models. This compression facilitates the processing of higher-resolution shapes and allows us to apply our model across multiple object classes, a significant improvement over other existing diffusion-based shape completion methods, which often require a separate diffusion model for each class. We validated our approach against two common benchmarks in the field of shape completion, demonstrating competitive performance in terms of accuracy and realism and performing on par with state-of-the-art methods despite operating at a higher resolution with a single model for all object classes. We present a comprehensive evaluation of our model, showcasing its efficacy in handling diverse shape completion challenges, even on unseen object classes. The code will be released upon acceptance.