Abstract:Point cloud representation has gained traction due to its efficient memory usage and simplicity in acquisition, manipulation, and storage. However, as point cloud sizes increase, effective down-sampling becomes essential to address the computational requirements of downstream tasks. Classical approaches, such as furthest point sampling (FPS), perform well on benchmarks but rely on heuristics and overlook geometric features, like curvature, during down-sampling. In this paper, We introduce a reinforcement learning-based sampling algorithm that enhances FPS by integrating curvature information. Our approach ranks points by combining FPS-derived soft ranks with curvature scores computed by a deep neural network, allowing us to replace a proportion of low-curvature points in the FPS set with high-curvature points from the unselected set. Existing differentiable sampling techniques often suffer from training instability, hindering their integration into end-to-end learning frameworks. By contrast, our method achieves stable end-to-end learning, consistently outperforming baseline models across multiple downstream geometry processing tasks. We provide comprehensive ablation studies, with both qualitative and quantitative insights into the effect of each feature on performance. Our algorithm establishes state-of-the-art results for classification, segmentation and shape completion, showcasing its robustness and adaptability.
Abstract:Robot learning tasks are extremely compute-intensive and hardware-specific. Thus the avenues of tackling these challenges, using a diverse dataset of offline demonstrations that can be used to train robot manipulation agents, is very appealing. The Train-Offline-Test-Online (TOTO) Benchmark provides a well-curated open-source dataset for offline training comprised mostly of expert data and also benchmark scores of the common offline-RL and behaviour cloning agents. In this paper, we introduce DiffClone, an offline algorithm of enhanced behaviour cloning agent with diffusion-based policy learning, and measured the efficacy of our method on real online physical robots at test time. This is also our official submission to the Train-Offline-Test-Online (TOTO) Benchmark Challenge organized at NeurIPS 2023. We experimented with both pre-trained visual representation and agent policies. In our experiments, we find that MOCO finetuned ResNet50 performs the best in comparison to other finetuned representations. Goal state conditioning and mapping to transitions resulted in a minute increase in the success rate and mean-reward. As for the agent policy, we developed DiffClone, a behaviour cloning agent improved using conditional diffusion.