Abstract:With robots increasingly collaborating with humans in everyday tasks, it is important to take steps toward robotic systems capable of understanding the environment. This work focuses on scene understanding to detect pick and place tasks given initial and final images from the scene. To this end, a dataset is collected for object detection and pick and place task detection. A YOLOv5 network is subsequently trained to detect the objects in the initial and final scenes. Given the detected objects and their bounding boxes, two methods are proposed to detect the pick and place tasks which transform the initial scene into the final scene. A geometric method is proposed which tracks objects' movements in the two scenes and works based on the intersection of the bounding boxes which moved within scenes. Contrarily, the CNN-based method utilizes a Convolutional Neural Network to classify objects with intersected bounding boxes into 5 classes, showing the spatial relationship between the involved objects. The performed pick and place tasks are then derived from analyzing the experiments with both scenes. Results show that the CNN-based method, using a VGG16 backbone, outscores the geometric method by roughly 12 percentage points in certain scenarios, with an overall success rate of 84.3%.
Abstract:Humans, this species expert in grasp detection, can grasp objects by taking into account hand-object positioning information. This work proposes a method to enable a robot manipulator to learn the same, grasping objects in the most optimal way according to how the gripper has approached the object. Built on deep learning, the proposed method consists of two main stages. In order to generalize the network on unseen objects, the proposed Approach-based Grasping Inference involves an element decomposition stage to split an object into its main parts, each with one or more annotated grasps for a particular approach of the gripper. Subsequently, a grasp detection network utilizes the decomposed elements by Mask R-CNN and the information on the approach of the gripper in order to detect the element the gripper has approached and the most optimal grasp. In order to train the networks, the study introduces a robotic grasping dataset collected in the Coppeliasim simulation environment. The dataset involves 10 different objects with annotated element decomposition masks and grasp rectangles. The proposed method acquires a 90% grasp success rate on seen objects and 78% on unseen objects in the Coppeliasim simulation environment. Lastly, simulation-to-reality domain adaptation is performed by applying transformations on the training set collected in simulation and augmenting the dataset, which results in a 70% physical grasp success performance using a Delta parallel robot and a 2 -fingered gripper.
Abstract:According to WHO[1], since the 1970s, diagnosis of melanoma skin cancer has been more frequent. However, if detected early, the 5-year survival rate for melanoma can increase to 99 percent. In this regard, skin lesion segmentation can be pivotal in monitoring and treatment planning. In this work, ten models and four augmentation configurations are trained on the ISIC 2016 dataset. The performance and overfitting are compared utilizing five metrics. Our results show that the U-Net-Resnet50 and the R2U-Net have the highest metrics value, along with two data augmentation scenarios. We also investigate CBAM and AG blocks in the U-Net architecture, which enhances segmentation performance at a meager computational cost. In addition, we propose using pyramid, AG, and CBAM blocks in a sequence, which significantly surpasses the results of using the two individually. Finally, our experiments show that models that have exploited attention modules successfully overcome common skin lesion segmentation problems. Lastly, in the spirit of reproducible research, we implement models and codes publicly available.