Multiscale Medical Robotics Center, Hong Kong, China
Abstract:This work proposes DOFS, a pilot dataset of 3D deformable objects (DOs) (e.g., elasto-plastic objects) with full spatial information (i.e., top, side, and bottom information) using a novel and low-cost data collection platform with a transparent operating plane. The dataset consists of active manipulation action, multi-view RGB-D images, well-registered point clouds, 3D deformed mesh, and 3D occupancy with semantics, using a pinching strategy with a two-parallel-finger gripper. In addition, we trained a neural network with the down-sampled 3D occupancy and action as input to model the dynamics of an elasto-plastic object. Our dataset and all CADs of the data collection system will be released soon on our website.
Abstract:Ultrasound-guided percutaneous needle insertion is a standard procedure employed in both biopsy and ablation in clinical practices. However, due to the complex interaction between tissue and instrument, the needle may deviate from the in-plane view, resulting in a lack of close monitoring of the percutaneous needle. To address this challenge, we introduce a robot-assisted ultrasound (US) imaging system designed to seamlessly monitor the insertion process and autonomously restore the visibility of the inserted instrument when misalignment happens. To this end, the adversarial structure is presented to encourage the generation of segmentation masks that align consistently with the ground truth in high-order space. This study also systematically investigates the effects on segmentation performance by exploring various training loss functions and their combinations. When misalignment between the probe and the percutaneous needle is detected, the robot is triggered to perform transverse searching to optimize the positional and rotational adjustment to restore needle visibility. The experimental results on ex-vivo porcine samples demonstrate that the proposed method can precisely segment the percutaneous needle (with a tip error of $0.37\pm0.29mm$ and an angle error of $1.19\pm 0.29^{\circ}$). Furthermore, the needle appearance can be successfully restored under the repositioned probe pose in all 45 trials, with repositioning errors of $1.51\pm0.95mm$ and $1.25\pm0.79^{\circ}$. from latex to text with math symbols
Abstract:Intelligent vision control systems for surgical robots should adapt to unknown and diverse objects while being robust to system disturbances. Previous methods did not meet these requirements due to mainly relying on pose estimation and feature tracking. We propose a world-model-based deep reinforcement learning framework "Grasp Anything for Surgery" (GAS), that learns a pixel-level visuomotor policy for surgical grasping, enhancing both generality and robustness. In particular, a novel method is proposed to estimate the values and uncertainties of depth pixels for a rigid-link object's inaccurate region based on the empirical prior of the object's size; both depth and mask images of task objects are encoded to a single compact 3-channel image (size: 64x64x3) by dynamically zooming in the mask regions, minimizing the information loss. The learned controller's effectiveness is extensively evaluated in simulation and in a real robot. Our learned visuomotor policy handles: i) unseen objects, including 5 types of target grasping objects and a robot gripper, in unstructured real-world surgery environments, and ii) disturbances in perception and control. Note that we are the first work to achieve a unified surgical control system that grasps diverse surgical objects using different robot grippers on real robots in complex surgery scenes (average success rate: 69%). Our system also demonstrates significant robustness across 6 conditions including background variation, target disturbance, camera pose variation, kinematic control error, image noise, and re-grasping after the gripped target object drops from the gripper. Videos and codes can be found on our project page: https://linhongbin.github.io/gas/.
Abstract:In robotic deformable object manipulation (DOM) applications, constraints arise commonly from environments and task-specific requirements. Enabling DOM with constraints is therefore crucial for its deployment in practice. However, dealing with constraints turns out to be challenging due to many inherent factors such as inaccessible deformation models of deformable objects (DOs) and varying environmental setups. This article presents a systematic manipulation framework for DOM subject to constraints by proposing a novel path set planning and tracking scheme. First, constrained DOM tasks are formulated into a versatile optimization formalism which enables dynamic constraint imposition. Because of the lack of the local optimization objective and high state dimensionality, the formulated problem is not analytically solvable. To address this, planning of the path set, which collects paths of DO feedback points, is proposed subsequently to offer feasible path and motion references for DO in constrained setups. Both theoretical analyses and computationally efficient algorithmic implementation of path set planning are discussed. Lastly, a control architecture combining path set tracking and constraint handling is designed for task execution. The effectiveness of our methods is validated in a variety of DOM tasks with constrained experimental settings.
Abstract:Robotic skill learning has been increasingly studied but the demonstration collections are more challenging compared to collecting images/videos in computer vision and texts in natural language processing. This paper presents a skill learning paradigm by using intuitive teleoperation devices to generate high-quality human demonstrations efficiently for robotic skill learning in a data-driven manner. By using a reliable teleoperation interface, the da Vinci Research Kit (dVRK) master, a system called dVRK-Simulator-for-Demonstration (dS4D) is proposed in this paper. Various manipulation tasks show the system's effectiveness and advantages in efficiency compared to other interfaces. Using the collected data for policy learning has been investigated, which verifies the initial feasibility. We believe the proposed paradigm can facilitate robot learning driven by high-quality demonstrations and efficiency while generating them.
Abstract:This paper proposes an interactive navigation framework by using large language and vision-language models, allowing robots to navigate in environments with traversable obstacles. We utilize the large language model (GPT-3.5) and the open-set Vision-language Model (Grounding DINO) to create an action-aware costmap to perform effective path planning without fine-tuning. With the large models, we can achieve an end-to-end system from textual instructions like "Can you pass through the curtains to deliver medicines to me?", to bounding boxes (e.g., curtains) with action-aware attributes. They can be used to segment LiDAR point clouds into two parts: traversable and untraversable parts, and then an action-aware costmap is constructed for generating a feasible path. The pre-trained large models have great generalization ability and do not require additional annotated data for training, allowing fast deployment in the interactive navigation tasks. We choose to use multiple traversable objects such as curtains and grasses for verification by instructing the robot to traverse them. Besides, traversing curtains in a medical scenario was tested. All experimental results demonstrated the proposed framework's effectiveness and adaptability to diverse environments.
Abstract:Needle picking is a challenging surgical task in robot-assisted surgery due to the characteristics of small slender shapes of needles, needles' variations in shapes and sizes, and demands for millimeter-level control. Prior works, heavily relying on the prior of needles (e.g., geometric models), are hard to scale to unseen needles' variations. In addition, visual tracking errors can not be minimized online using their approaches. In this paper, we propose an end-to-end deep visual learning framework for needle-picking tasks where both visual and control components can be learned jointly online. Our proposed framework integrates a state-of-the-art reinforcement learning framework, Dreamer, with behavior cloning (BC). Besides, two novel techniques, i.e., Virtual Clutch and Dynamic Spotlight Adaptation (DSA), are introduced to our end-to-end visual controller for needle-picking tasks. We conducted extensive experiments in simulation to evaluate the performance, robustness, variation adaptation, and effectiveness of individual components of our method. Our approach, trained by 8k demonstration timesteps and 140k online policy timesteps, can achieve a remarkable success rate of 80%, a new state-of-the-art with end-to-end vision-based surgical robot learning for delicate operations tasks. Furthermore, our method effectively demonstrated its superiority in generalization to unseen dynamic scenarios with needle variations and image disturbance, highlighting its robustness and versatility. Codes and videos are available at https://sites.google.com/view/dreamerbc.
Abstract:Projected Inverse Dynamics Control (PIDC) is commonly used in robots subject to contact, especially in quadrupedal systems. Many methods based on such dynamics have been developed for quadrupedal locomotion tasks, and only a few works studied simple interactions between the robot and environment, such as pressing an E-stop button. To facilitate the interaction requiring exact force control for safety, we propose a novel interaction force control scheme for underactuated quadrupedal systems relying on projection techniques and Quadratic Programming (QP). This algorithm allows the robot to apply a desired interaction force to the environment without using force sensors while satisfying physical constraints and inducing minimal base motion. Unlike previous projection-based methods, the QP design uses two selection matrices in its hierarchical structure, facilitating the decoupling between force and motion control. The proposed algorithm is verified with a quadrupedal robot in a high-fidelity simulator. Compared to the QP designs without the strategy of using two selection matrices and the PIDC method for contact force control, our method provided more accurate contact force tracking performance with minimal base movement, paving the way to approach the exact interaction force control for underactuated quadrupedal systems.
Abstract:Learning high-performance deep neural networks for dynamic modeling of high Degree-Of-Freedom (DOF) robots remains challenging due to the sampling complexity. Typical unknown system disturbance caused by unmodeled dynamics (such as internal compliance, cables) further exacerbates the problem. In this paper, a novel framework characterized by both high data efficiency and disturbance-adapting capability is proposed to address the problem of modeling gravitational dynamics using deep nets in feedforward gravity compensation control for high-DOF master manipulators with unknown disturbance. In particular, Feedforward Deep Neural Networks (FDNNs) are learned from both prior knowledge of an existing analytical model and observation of the robot system by Knowledge Distillation (KD). Through extensive experiments in high-DOF master manipulators with significant disturbance, we show that our method surpasses a standard Learning-from-Scratch (LfS) approach in terms of data efficiency and disturbance adaptation. Our initial feasibility study has demonstrated the potential of outperforming the analytical teacher model as the training data increases.
Abstract:Falling cat problem is well-known where cats show their super aerial reorientation capability and can land safely. For their robotic counterparts, a similar falling quadruped robot problem, has not been fully addressed, although achieving safe landing as the cats has been increasingly investigated. Unlike imposing the burden on landing control, we approach to safe landing of falling quadruped robots by effective flight phase control. Different from existing work like swinging legs and attaching reaction wheels or simple tails, we propose to deploy a 3-DoF morphable inertial tail on a medium-size quadruped robot. In the flight phase, the tail with its maximum length can self-right the body orientation in 3D effectively; before touch-down, the tail length can be retracted to about 1/4 of its maximum for impressing the tail's side-effect on landing. To enable aerial reorientation for safe landing in the quadruped robots, we design a control architecture, which has been verified in a high-fidelity physics simulation environment with different initial conditions. Experimental results on a customized flight-phase test platform with comparable inertial properties are provided and show the tail's effectiveness on 3D body reorientation and its fast retractability before touch-down. An initial falling quadruped robot experiment is shown, where the robot Unitree A1 with the 3-DoF tail can land safely subject to non-negligible initial body angles.