Abstract:Robotic grasping in the open world is a critical component of manufacturing and automation processes. While numerous existing approaches depend on 2D segmentation output to facilitate the grasping procedure, accurately determining depth from 2D imagery remains a challenge, often leading to limited performance in complex stacking scenarios. In contrast, techniques utilizing 3D point cloud data inherently capture depth information, thus enabling adeptly navigating and manipulating a diverse range of complex stacking scenes. However, such efforts are considerably hindered by the variance in data capture devices and the unstructured nature of the data, which limits their generalizability. Consequently, much research is narrowly concentrated on managing designated objects within specific settings, which confines their real-world applicability. This paper presents a novel pipeline capable of executing object grasping tasks in open-world scenarios even on previously unseen objects without the necessity for training. Additionally, our pipeline supports the flexible use of different 3D point cloud segmentation models across a variety of scenes. Leveraging the segmentation results, we propose to engage a training-free binary clustering algorithm that not only improves segmentation precision but also possesses the capability to cluster and localize unseen objects for executing grasping operations. In our experiments, we investigate a range of open-world scenarios, and the outcomes underscore the remarkable robustness and generalizability of our pipeline, consistent across various environments, robots, cameras, and objects. The code will be made available upon acceptance of the paper.
Abstract:Generalized category discovery presents a challenge in a realistic scenario, which requires the model's generalization ability to recognize unlabeled samples from known and unknown categories. This paper revisits the challenge of generalized category discovery through the lens of information maximization (InfoMax) with a probabilistic parametric classifier. Our findings reveal that ensuring independence between known and unknown classes while concurrently assuming a uniform probability distribution across all classes, yields an enlarged margin among known and unknown classes that promotes the model's performance. To achieve the aforementioned independence, we propose a novel InfoMax-based method, Regularized Parametric InfoMax (RPIM), which adopts pseudo labels to supervise unlabeled samples during InfoMax, while proposing a regularization to ensure the quality of the pseudo labels. Additionally, we introduce novel semantic-bias transformation to refine the features from the pre-trained model instead of direct fine-tuning to rescue the computational costs. Extensive experiments on six benchmark datasets validate the effectiveness of our method. RPIM significantly improves the performance regarding unknown classes, surpassing the state-of-the-art method by an average margin of 3.5%.