Abstract:Strawberry harvesting robots faced persistent challenges such as low integration of visual perception, fruit-gripper misalignment, empty grasping, and strawberry slippage from the gripper due to insufficient gripping force, all of which compromised harvesting stability and efficiency in orchard environments. To overcome these issues, this paper proposed a visual fault diagnosis and self-recovery framework that integrated multi-task perception with corrective control strategies. At the core of this framework was SRR-Net, an end-to-end multi-task perception model that simultaneously performed strawberry detection, segmentation, and ripeness estimation, thereby unifying visual perception with fault diagnosis. Based on this integrated perception, a relative error compensation method based on the simultaneous target-gripper detection was designed to address positional misalignment, correcting deviations when error exceeded the tolerance threshold. To mitigate empty grasping and fruit-slippage faults, an early abort strategy was implemented. A micro-optical camera embedded in the end-effector provided real-time visual feedback, enabling grasp detection during the deflating stage and strawberry slip prediction during snap-off through MobileNet V3-Small classifier and a time-series LSTM classifier. Experiments demonstrated that SRR-Net maintained high perception accuracy. For detection, it achieved a precision of 0.895 and recall of 0.813 on strawberries, and 0.972/0.958 on hands. In segmentation, it yielded a precision of 0.887 and recall of 0.747 for strawberries, and 0.974/0.947 for hands. For ripeness estimation, SRR-Net attained a mean absolute error of 0.035, while simultaneously supporting multi-task perception and sustaining a competitive inference speed of 163.35 FPS.




Abstract:The paper is concerned with the dynamic tracking problem of SNAP orchards harvesting robots in the presence of multiple uncalibrated model parameters in the application of dwarf culture orchards harvest. A new hybrid visual servoing adaptive tracking controller and three adaptive laws are proposed to guarantee harvesting robots to finish the dynamic harvesting task and the adaption to unknown parameters including camera intrinsic and extrinsic model and robot dynamics. By the Lyapunov theory, asymptotic convergence of the closed-loop system with the proposed control scheme is rigorously proven. Experimental and simulation results have been conducted to verify the performance of the proposed control scheme. The results demonstrate its effectiveness and superiority.




Abstract:RGB-D cameras have been successfully used for indoor High-ThroughpuT Phenotyping (HTTP). However, their capability and feasibility for in-field HTTP still need to be evaluated, due to the noise and disturbances generated by unstable illumination, specular reflection, and diffuse reflection, etc. To solve these problems, we evaluated the depth-ranging performances of two consumer-level RGB-D cameras (RealSense D435i and Kinect V2) under in-field HTTP scenarios, and proposed a strategy to compensate the depth measurement error. For performance evaluation, we focused on determining their optimal ranging areas for different crop organs. Based on the evaluation results, we proposed a brightness-and-distance-based Support Vector Regression Strategy, to compensate the ranging error. Furthermore, we analyzed the depth filling rate of two RGB-D cameras under different lighting intensities. Experimental results showed that: 1) For RealSense D435i, its effective ranging area is [0.160, 1.400] m, and in-field filling rate is approximately 90%. 2) For Kinect V2, it has a high ranging accuracy in the [0.497, 1.200] m, but its in-field filling rate is less than 24.9%. 3) Our error compensation model can effectively reduce the influences of lighting intensity and target distance. The maximum MSE and minimum R2 of this model are 0.029 and 0.867, respectively. To sum up, RealSense D435i has better ranging performances than Kinect V2 on in-field HTTP.