Osaka University
Abstract:Robotic packaging using wrapping paper poses significant challenges due to the material's complex deformation properties. The packaging process itself involves multiple steps, primarily categorized as folding the paper or creating creases. Small deviations in the robot's arm trajectory or force vector can lead to tearing or wrinkling of the paper, exacerbated by the variability in material properties. This study introduces a novel framework that combines imitation learning and reinforcement learning to enable a robot to perform each step of the packaging process efficiently. The framework allows the robot to follow approximate trajectories of the tool-center point (TCP) based on human demonstrations while optimizing force control parameters to prevent tearing or wrinkling, even with variable wrapping paper materials. The proposed method was validated through ablation studies, which demonstrated successful task completion with a significant reduction in tear and wrinkle rates. Furthermore, the force control strategy proved to be adaptable across different wrapping paper materials and robust against variations in the size of the target object.
Abstract:Robotic grasping is facing a variety of real-world uncertainties caused by non-static object states, unknown object properties, and cluttered object arrangements. The difficulty of grasping increases with the presence of more uncertainties, where commonly used learning-based approaches struggle to perform consistently across varying conditions. In this study, we integrate the idea of similarity matching to tackle the challenge of grasping novel objects that are simultaneously in motion and densely cluttered using a single RGBD camera, where multiple uncertainties coexist. We achieve this by shifting visual detection from global to local states and operating grasp planning from static to dynamic scenes. Notably, we introduce optimization methods to enhance planning efficiency for this time-sensitive task. Our proposed system can adapt to various object types, arrangements and movement speeds without the need for extensive training, as demonstrated by real-world experiments.
Abstract:Mobile grasping enhances manipulation efficiency by utilizing robots' mobility. This study aims to enable a commercial off-the-shelf robot for mobile grasping, requiring precise timing and pose adjustments. Self-supervised learning can develop a generalizable policy to adjust the robot's velocity and determine grasp position and orientation based on the target object's shape and pose. Due to mobile grasping's complexity, action primitivization and step-by-step learning are crucial to avoid data sparsity in learning from trial and error. This study simplifies mobile grasping into two grasp action primitives and a moving action primitive, which can be operated with limited degrees of freedom for the manipulator. This study introduces three fully convolutional neural network (FCN) models to predict static grasp primitive, dynamic grasp primitive, and residual moving velocity error from visual inputs. A two-stage grasp learning approach facilitates seamless FCN model learning. The ablation study demonstrated that the proposed method achieved the highest grasping accuracy and pick-and-place efficiency. Furthermore, randomizing object shapes and environments in the simulation effectively achieved generalizable mobile grasping.
Abstract:This study explores a pick-and-toss (PT) as an alternative to pick-and-place (PP), allowing a robot to extend its range and improve task efficiency. Although PT boosts efficiency in object arrangement, the placement environment critically affects the success of tossing. To achieve accurate and efficient object arrangement, we suggest choosing between PP and PT based on task difficulty estimated from the placement environment. Our method simultaneously learns the tossing motion through self-supervised learning and the task determination policy via brute-force search. Experimental results validate the proposed method through simulations and real-world tests on various rectangular object arrangements.
Abstract:Robotic waste sorting poses significant challenges in both perception and manipulation, given the extreme variability of objects that should be recognized on a cluttered conveyor belt. While deep learning has proven effective in solving complex tasks, the necessity for extensive data collection and labeling limits its applicability in real-world scenarios like waste sorting. To tackle this issue, we introduce a data augmentation method based on a novel GAN architecture called wasteGAN. The proposed method allows to increase the performance of semantic segmentation models, starting from a very limited bunch of labeled examples, such as few as 100. The key innovations of wasteGAN include a novel loss function, a novel activation function, and a larger generator block. Overall, such innovations helps the network to learn from limited number of examples and synthesize data that better mirrors real-world distributions. We then leverage the higher-quality segmentation masks predicted from models trained on the wasteGAN synthetic data to compute semantic-aware grasp poses, enabling a robotic arm to effectively recognizing contaminants and separating waste in a real-world scenario. Through comprehensive evaluation encompassing dataset-based assessments and real-world experiments, our methodology demonstrated promising potential for robotic waste sorting, yielding performance gains of up to 5.8\% in picking contaminants. The project page is available at https://github.com/bach05/wasteGAN.git
Abstract:Inspired by traditional handmade crafts, where a person improvises assemblies based on the available objects, we formally introduce the Craft Assembly Task. It is a robotic assembly task that involves building an accurate representation of a given target object using the available objects, which do not directly correspond to its parts. In this work, we focus on selecting the subset of available objects for the final craft, when the given input is an RGB image of the target in the wild. We use a mask segmentation neural network to identify visible parts, followed by retrieving labelled template meshes. These meshes undergo pose optimization to determine the most suitable template. Then, we propose to simplify the parts of the transformed template mesh to primitive shapes like cuboids or cylinders. Finally, we design a search algorithm to find correspondences in the scene based on local and global proportions. We develop baselines for comparison that consider all possible combinations, and choose the highest scoring combination for common metrics used in foreground maps and mask accuracy. Our approach achieves comparable results to the baselines for two different scenes, and we show qualitative results for an implementation in a real-world scenario.
Abstract:This study tasckles the problem of many-objective sequence optimization for semi-automated robotic disassembly operations. To this end, we employ a many-objective genetic algorithm (MaOGA) algorithm inspired by the Non-dominated Sorting Genetic Algorithm (NSGA)-III, along with robotic-disassembly-oriented constraints and objective functions derived from geometrical and robot simulations using 3-dimensional (3D) geometrical information stored in a 3D Computer-Aided Design (CAD) model of the target product. The MaOGA begins by generating a set of initial chromosomes based on a contact and connection graph (CCG), rather than random chromosomes, to avoid falling into a local minimum and yield repeatable convergence. The optimization imposes constraints on feasibility and stability as well as objective functions regarding difficulty, efficiency, prioritization, and allocability to generate a sequence that satisfies many preferred conditions under mandatory requirements for semi-automated robotic disassembly. The NSGA-III-inspired MaOGA also utilizes non-dominated sorting and niching with reference lines to further encourage steady and stable exploration and uniformly lower the overall evaluation values. Our sequence generation experiments for a complex product (36 parts) demonstrated that the proposed method can consistently produce feasible and stable sequences with a 100% success rate, bringing the multiple preferred conditions closer to the optimal solution required for semi-automated robotic disassembly operations.
Abstract:To safely and efficiently extract an object from the clutter, this paper presents a bimanual manipulation planner in which one hand of the robot is used to slide the target object out of the clutter while the other hand is used to support the surrounding objects to prevent the clutter from collapsing. Our method uses a neural network to predict the physical phenomena of the clutter when the target object is moved. We generate the most efficient action based on the Monte Carlo tree search.The grasping and sliding actions are planned to minimize the number of motion sequences to pick the target object. In addition, the object to be supported is determined to minimize the position change of surrounding objects. Experiments with a real bimanual robot confirmed that the robot could retrieve the target object, reducing the total number of motion sequences and improving safety.
Abstract:Training deep-learning-based vision systems requires the manual annotation of a significant amount of data to optimize several parameters of the deep convolutional neural networks. Such manual annotation is highly time-consuming and labor-intensive. To reduce this burden, a previous study presented a fully automated annotation approach that does not require any manual intervention. The proposed method associates a visual marker with an object and captures it in the same image. However, because the previous method relied on moving the object within the capturing range using a fixed-point camera, the collected image dataset was limited in terms of capturing viewpoints. To overcome this limitation, this study presents a mobile application-based free-viewpoint image-capturing method. With the proposed application, users can collect multi-view image datasets automatically that are annotated with bounding boxes by moving the camera. However, capturing images through human involvement is laborious and monotonous. Therefore, we propose gamified application features to track the progress of the collection status. Our experiments demonstrated that using the gamified mobile application for bounding box annotation, with visible collection progress status, can motivate users to collect multi-view object image datasets with less mental workload and time pressure in an enjoyable manner, leading to increased engagement.
Abstract:Robotic pick-and-place has been researched for a long time to cope with uncertainty of novel objects and changeable environments. Past works mainly focus on learning-based methods to achieve high precision. However, they have difficulty being generalized for the limitation of specified training models. To break through this drawback of learning-based approaches, we introduce a new perspective of similarity matching between novel objects and a known database based on category-association to achieve pick-and-place tasks with high accuracy and stabilization. We calculate the category name similarity using word embedding to quantify the semantic similarity between the categories of known models and the target real-world objects. With a similar model identified by a similarity prediction function, we preplan a series of robust grasps and imitate them to plan new grasps on the real-world target object. We also propose a distance-based method to infer the in-hand posture of objects and adjust small rotations to achieve stable placements under uncertainty. Through a real-world robotic pick-and-place experiment with a dozen of in-category and out-of-category novel objects, our method achieved an average success rate of 90.6% and 75.9% respectively, validating the capacity of generalization to diverse objects.