Abstract:Large and rich data is a prerequisite for effective training of deep neural networks. However, the irregularity of point cloud data makes manual annotation time-consuming and laborious. Self-supervised representation learning, which leverages the intrinsic structure of large-scale unlabelled data to learn meaningful feature representations, has attracted increasing attention in the field of point cloud research. However, self-supervised representation learning often suffers from model collapse, resulting in reduced information and diversity of the learned representation, and consequently degrading the performance of downstream tasks. To address this problem, we propose PointMoment, a novel framework for point cloud self-supervised representation learning that utilizes a high-order mixed moment loss function rather than the conventional contrastive loss function. Moreover, our framework does not require any special techniques such as asymmetric network architectures, gradient stopping, etc. Specifically, we calculate the high-order mixed moment of the feature variables and force them to decompose into products of their individual moment, thereby making multiple variables more independent and minimizing the feature redundancy. We also incorporate a contrastive learning approach to maximize the feature invariance under different data augmentations of the same point cloud. Experimental results show that our approach outperforms previous unsupervised learning methods on the downstream task of 3D point cloud classification and segmentation.
Abstract:Most deep learning-based point cloud processing methods are supervised and require large scale of labeled data. However, manual labeling of point cloud data is laborious and time-consuming. Self-supervised representation learning can address the aforementioned issue by learning robust and generalized representations from unlabeled datasets. Nevertheless, the embedded features obtained by representation learning usually contain redundant information, and most current methods reduce feature redundancy by linear correlation constraints. In this paper, we propose PointJEM, a self-supervised representation learning method applied to the point cloud field. PointJEM comprises an embedding scheme and a loss function based on joint entropy. The embedding scheme divides the embedding vector into different parts, each part can learn a distinctive feature. To reduce redundant information in the features, PointJEM maximizes the joint entropy between the different parts, thereby rendering the learned feature variables pairwise independent. To validate the effectiveness of our method, we conducted experiments on multiple datasets. The results demonstrate that our method can significantly reduce feature redundancy beyond linear correlation. Furthermore, PointJEM achieves competitive performance in downstream tasks such as classification and segmentation.