Abstract:Recent CLIP-guided 3D generation methods have achieved promising results but struggle with generating faithful 3D shapes that conform with input text due to the gap between text and image embeddings. To this end, this paper proposes HOTS3D which makes the first attempt to effectively bridge this gap by aligning text features to the image features with spherical optimal transport (SOT). However, in high-dimensional situations, solving the SOT remains a challenge. To obtain the SOT map for high-dimensional features obtained from CLIP encoding of two modalities, we mathematically formulate and derive the solution based on Villani's theorem, which can directly align two hyper-sphere distributions without manifold exponential maps. Furthermore, we implement it by leveraging input convex neural networks (ICNNs) for the optimal Kantorovich potential. With the optimally mapped features, a diffusion-based generator and a Nerf-based decoder are subsequently utilized to transform them into 3D shapes. Extensive qualitative and qualitative comparisons with state-of-the-arts demonstrate the superiority of the proposed HOTS3D for 3D shape generation, especially on the consistency with text semantics.
Abstract:This paper introduces a novel method for reconstructing meshes from sparse point clouds by predicting edge connection. Existing implicit methods usually produce superior smooth and watertight meshes due to the isosurface extraction algorithms~(e.g., Marching Cubes). However, these methods become memory and computationally intensive with increasing resolution. Explicit methods are more efficient by directly forming the face from points. Nevertheless, the challenge of selecting appropriate faces from enormous candidates often leads to undesirable faces and holes. Moreover, the reconstruction performance of both approaches tends to degrade when the point cloud gets sparse. To this end, we propose MEsh Reconstruction via edGE~(MergeNet), which converts mesh reconstruction into local connectivity prediction problems. Specifically, MergeNet learns to extract the features of candidate edges and regress their distances to the underlying surface. Consequently, the predicted distance is utilized to filter out edges that lay on surfaces. Finally, the meshes are reconstructed by refining the triangulations formed by these edges. Extensive experiments on synthetic and real-scanned datasets demonstrate the superiority of MergeNet to SoTA explicit methods.
Abstract:This paper presents a novel point cloud compression method COT-PCC by formulating the task as a constrained optimal transport (COT) problem. COT-PCC takes the bitrate of compressed features as an extra constraint of optimal transport (OT) which learns the distribution transformation between original and reconstructed points. Specifically, the formulated COT is implemented with a generative adversarial network (GAN) and a bitrate loss for training. The discriminator measures the Wasserstein distance between input and reconstructed points, and a generator calculates the optimal mapping between distributions of input and reconstructed point cloud. Moreover, we introduce a learnable sampling module for downsampling in the compression procedure. Extensive results on both sparse and dense point cloud datasets demonstrate that COT-PCC outperforms state-of-the-art methods in terms of both CD and PSNR metrics. Source codes are available at \url{https://github.com/cognaclee/PCC-COT}.
Abstract:Sampling from diffusion probabilistic models (DPMs) can be viewed as a piecewise distribution transformation, which generally requires hundreds or thousands of steps of the inverse diffusion trajectory to get a high-quality image. Recent progress in designing fast samplers for DPMs achieves a trade-off between sampling speed and sample quality by knowledge distillation or adjusting the variance schedule or the denoising equation. However, it can't be optimal in both aspects and often suffer from mode mixture in short steps. To tackle this problem, we innovatively regard inverse diffusion as an optimal transport (OT) problem between latents at different stages and propose the DPM-OT, a unified learning framework for fast DPMs with a direct expressway represented by OT map, which can generate high-quality samples within around 10 function evaluations. By calculating the semi-discrete optimal transport map between the data latents and the white noise, we obtain an expressway from the prior distribution to the data distribution, while significantly alleviating the problem of mode mixture. In addition, we give the error bound of the proposed method, which theoretically guarantees the stability of the algorithm. Extensive experiments validate the effectiveness and advantages of DPM-OT in terms of speed and quality (FID and mode mixture), thus representing an efficient solution for generative modeling. Source codes are available at https://github.com/cognaclee/DPM-OT
Abstract:With the widespread application of optimal transport (OT), its calculation becomes essential, and various algorithms have emerged. However, the existing methods either have low efficiency or cannot represent discontinuous maps. A novel reusable neural OT solver OT-Net is thus presented, which first learns Brenier's height representation via the neural network to obtain its potential, and then gained the OT map by computing the gradient of the potential. The algorithm has two merits, 1) it can easily represent discontinuous maps, which allows it to match any target distribution with discontinuous supports and achieve sharp boundaries. This can well eliminate mode collapse in the generated models. 2) The OT map can be calculated straightly by the proposed algorithm when new target samples are added, which greatly improves the efficiency and reusability of the map. Moreover, the theoretical error bound of the algorithm is analyzed, and we have demonstrated the empirical success of our approach in image generation, color transfer, and domain adaptation.
Abstract:Intelligent mesh generation (IMG) refers to a technique to generate mesh by machine learning, which is a relatively new and promising research field. Within its short life span, IMG has greatly expanded the generalizability and practicality of mesh generation techniques and brought many breakthroughs and potential possibilities for mesh generation. However, there is a lack of surveys focusing on IMG methods covering recent works. In this paper, we are committed to a systematic and comprehensive survey describing the contemporary IMG landscape. Focusing on 110 preliminary IMG methods, we conducted an in-depth analysis and evaluation from multiple perspectives, including the core technique and application scope of the algorithm, agent learning goals, data types, targeting challenges, advantages and limitations. With the aim of literature collection and classification based on content extraction, we propose three different taxonomies from three views of key technique, output mesh unit element, and applicable input data types. Finally, we highlight some promising future research directions and challenges in IMG. To maximize the convenience of readers, a project page of IMG is provided at \url{https://github.com/xzb030/IMG_Survey}.