Abstract:Autonomous under-canopy navigation faces additional challenges compared to over-canopy settings - for example the tight spacing between the crop rows, degraded GPS accuracy and excessive clutter. Keypoint-based visual navigation has been shown to perform well in these conditions, however the differences between agricultural environments in terms of lighting, season, soil and crop type mean that a domain shift will likely be encountered at some point of the robot deployment. In this paper, we explore the use of Meta-Learning to overcome this domain shift using a minimal amount of data. We train a base-learner that can quickly adapt to new conditions, enabling more robust navigation in low-data regimes.
Abstract:Under-canopy agricultural robots can enable various applications like precise monitoring, spraying, weeding, and plant manipulation tasks throughout the growing season. Autonomous navigation under the canopy is challenging due to the degradation in accuracy of RTK-GPS and the large variability in the visual appearance of the scene over time. In prior work, we developed a supervised learning-based perception system with semantic keypoint representation and deployed this in various field conditions. A large number of failures of this system can be attributed to the inability of the perception model to adapt to the domain shift encountered during deployment. In this paper, we propose a self-supervised online adaptation method for adapting the semantic keypoint representation using a visual foundational model, geometric prior, and pseudo labeling. Our preliminary experiments show that with minimal data and fine-tuning of parameters, the keypoint prediction model trained with labels on the source domain can be adapted in a self-supervised manner to various challenging target domains onboard the robot computer using our method. This can enable fully autonomous row-following capability in under-canopy robots across fields and crops without requiring human intervention.
Abstract:Under-canopy agricultural robots require robust navigation capabilities to enable full autonomy but struggle with tight row turning between crop rows due to degraded GPS reception, visual aliasing, occlusion, and complex vehicle dynamics. We propose an imitation learning approach using diffusion policies to learn row turning behaviors from demonstrations provided by human operators or privileged controllers. Simulation experiments in a corn field environment show potential in learning this task with only visual observations and velocity states. However, challenges remain in maintaining control within rows and handling varied initial conditions, highlighting areas for future improvement.
Abstract:Tracking plant features is crucial for various agricultural tasks like phenotyping, pruning, or harvesting, but the unstructured, cluttered, and deformable nature of plant environments makes it a challenging task. In this context, the recent advancements in foundational models show promise in addressing this challenge. In our work, we propose PlantTrack where we utilize DINOv2 which provides high-dimensional features, and train a keypoint heatmap predictor network to identify the locations of semantic features such as fruits and leaves which are then used as prompts for point tracking across video frames using TAPIR. We show that with as few as 20 synthetic images for training the keypoint predictor, we achieve zero-shot Sim2Real transfer, enabling effective tracking of plant features in real environments.
Abstract:Successful deployment of mobile robots in unstructured domains requires an understanding of the environment and terrain to avoid hazardous areas, getting stuck, and colliding with obstacles. Traversability estimation--which predicts where in the environment a robot can travel--is one prominent approach that tackles this problem. Existing geometric methods may ignore important semantic considerations, while semantic segmentation approaches involve a tedious labeling process. Recent self-supervised methods reduce labeling tedium, but require additional data or models and tend to struggle to explicitly label untraversable areas. To address these limitations, we introduce a weakly-supervised method for relative traversability estimation. Our method involves manually annotating the relative traversability of a small number of point pairs, which significantly reduces labeling effort compared to traditional segmentation-based methods and avoids the limitations of self-supervised methods. We further improve the performance of our method through a novel cross-image labeling strategy and loss function. We demonstrate the viability and performance of our method through deployment on a mobile robot in outdoor environments.
Abstract:We present a vision-based navigation system for under-canopy agricultural robots using semantic keypoints. Autonomous under-canopy navigation is challenging due to the tight spacing between the crop rows ($\sim 0.75$ m), degradation in RTK-GPS accuracy due to multipath error, and noise in LiDAR measurements from the excessive clutter. Our system, CropFollow++, introduces modular and interpretable perception architecture with a learned semantic keypoint representation. We deployed CropFollow++ in multiple under-canopy cover crop planting robots on a large scale (25 km in total) in various field conditions and we discuss the key lessons learned from this.