Abstract:Rotation-invariant recognition of shapes is a common challenge in computer vision. Recent approaches have significantly improved the accuracy of rotation-invariant recognition by encoding the rotational invariance of shapes as hand-crafted image features and introducing deep neural networks. However, the methods based on pixels have too much redundant information, and the critical geometric information is prone to early leakage, resulting in weak rotation-invariant recognition of fine-grained shapes. In this paper, we reconsider the shape recognition problem from the perspective of contour points rather than pixels. We propose an anti-noise rotation-invariant convolution module based on contour geometric aware for fine-grained shape recognition. The module divides the shape contour into multiple local geometric regions(LGA), where we implement finer-grained rotation-invariant coding in terms of point topological relations. We provide a deep network composed of five such cascaded modules for classification and retrieval experiments. The results show that our method exhibits excellent performance in rotation-invariant recognition of fine-grained shapes. In addition, we demonstrate that our method is robust to contour noise and the rotation centers. The source code is available at https://github.com/zhenguonie/ANRICN_CGA.
Abstract:In the realm of 3D reconstruction from 2D images, a persisting challenge is to achieve high-precision reconstructions devoid of 3D Ground Truth data reliance. We present UNeR3D, a pioneering unsupervised methodology that sets a new standard for generating detailed 3D reconstructions solely from 2D views. Our model significantly cuts down the training costs tied to supervised approaches and introduces RGB coloration to 3D point clouds, enriching the visual experience. Employing an inverse distance weighting technique for color rendering, UNeR3D ensures seamless color transitions, enhancing visual fidelity. Our model's flexible architecture supports training with any number of views, and uniquely, it is not constrained by the number of views used during training when performing reconstructions. It can infer with an arbitrary count of views during inference, offering unparalleled versatility. Additionally, the model's continuous spatial input domain allows the generation of point clouds at any desired resolution, empowering the creation of high-resolution 3D RGB point clouds. We solidify the reconstruction process with a novel multi-view geometric loss and color loss, demonstrating that our model excels with single-view inputs and beyond, thus reshaping the paradigm of unsupervised learning in 3D vision. Our contributions signal a substantial leap forward in 3D vision, offering new horizons for content creation across diverse applications. Code is available at https://github.com/HongbinLin3589/UNeR3D.