Abstract:This paper proposes a novel method to refine the 6D pose estimation inferred by an instance-level deep neural network which processes a single RGB image and that has been trained on synthetic images only. The proposed optimization algorithm usefully exploits the depth measurement of a standard RGB-D camera to estimate the dimensions of the considered object, even though the network is trained on a single CAD model of the same object with given dimensions. The improved accuracy in the pose estimation allows a robot to grasp apples of various types and significantly different dimensions successfully; this was not possible using the standard pose estimation algorithm, except for the fruits with dimensions very close to those of the CAD drawing used in the training process. Grasping fresh fruits without damaging each item also demands a suitable grasp force control. A parallel gripper equipped with special force/tactile sensors is thus adopted to achieve safe grasps with the minimum force necessary to lift the fruits without any slippage and any deformation at the same time, with no knowledge of their weight.