Abstract:What is considered safe for a robot operator during physical human-robot collaboration (HRC) is specified in corresponding HRC standards (e.g., the European ISO/TS 15066). The regime that allows collisions between the moving robot and the operator, called Power and Force Limiting (PFL), restricts the permissible contact forces. Using the same fixed contact thresholds on the entire robot surface results in significant and unnecessary productivity losses, as the robot needs to stop even when impact forces are within limits. Here we present a framework for setting the protective skin thresholds individually for different parts of the robot body and dynamically on the fly, based on the effective mass of each robot link and the link velocity. We perform experiments on a 6-axis collaborative robot arm (UR10e) completely covered with a sensitive skin (AIRSKIN) consisting of eleven individual pads. On a mock pick-and-place scenario with both transient and quasi-static collisions, we demonstrate how skin sensitivity influences the task performance and exerted force. We show an increase in productivity of almost 50% from the most conservative setting of collision thresholds to the most adaptive setting, while ensuring safety for human operators. The method is applicable to any robot for which the effective mass can be calculated.
Abstract:Artificial electronic skins covering complete robot bodies can make physical human-robot collaboration safe and hence possible. Standards for collaborative robots (e.g., ISO/TS 15066) prescribe permissible forces and pressures during contacts with the human body. These characteristics of the collision depend on the speed of the colliding robot link but also on its effective mass. Thus, to warrant contacts complying with the Power and Force Limiting (PFL) collaborative regime but at the same time maximizing productivity, protective skin thresholds should be set individually for different parts of the robot bodies and dynamically on the run. Here we present and empirically evaluate four scenarios: (a) static and uniform - fixed thresholds for the whole skin, (b) static but different settings for robot body parts, (c) dynamically set based on every link velocity, (d) dynamically set based on effective mass of every robot link. We perform experiments in simulation and on a real 6-axis collaborative robot arm (UR10e) completely covered with sensitive skin (AIRSKIN) comprising eleven individual pads. On a mock pick-and-place scenario with transient collisions with the robot body parts and two collision reactions (stop and avoid), we demonstrate the boost in productivity in going from the most conservative setting of the skin thresholds (a) to the most adaptive setting (d). The threshold settings for every skin pad are adapted with a frequency of 25 Hz. This work can be easily extended for platforms with more degrees of freedom and larger skin coverage (humanoids) and to social human-robot interaction scenarios where contacts with the robot will be used for communication.
Abstract:Automatic markerless estimation of infant posture and motion from ordinary videos carries great potential for movement studies "in the wild", facilitating understanding of motor development and massively increasing the chances of early diagnosis of disorders. There is rapid development of human pose estimation methods in computer vision thanks to advances in deep learning and machine learning. However, these methods are trained on datasets featuring adults in different contexts. This work tests and compares seven popular methods (AlphaPose, DeepLabCut/DeeperCut, Detectron2, HRNet, MediaPipe/BlazePose, OpenPose, and ViTPose) on videos of infants in supine position. Surprisingly, all methods except DeepLabCut and MediaPipe have competitive performance without additional finetuning, with ViTPose performing best. Next to standard performance metrics (object keypoint similarity, average precision and recall), we introduce errors expressed in the neck-mid-hip ratio and additionally study missed and redundant detections and the reliability of the internal confidence ratings of the different methods, which are relevant for downstream tasks. Among the networks with competitive performance, only AlphaPose could run close to real time (27 fps) on our machine. We provide documented Docker containers or instructions for all the methods we used, our analysis scripts, and processed data at https://hub.docker.com/u/humanoidsctu and https://osf.io/x465b/.
Abstract:Embodied reasoning systems integrate robotic hardware and cognitive processes to perform complex tasks typically in response to a natural language query about a specific physical environment. This usually involves changing the belief about the scene or physically interacting and changing the scene (e.g. 'Sort the objects from lightest to heaviest'). In order to facilitate the development of such systems we introduce a new simulating environment that makes use of MuJoCo physics engine and high-quality renderer Blender to provide realistic visual observations that are also accurate to the physical state of the scene. Together with the simulator we propose a new benchmark composed of 10 classes of multi-step reasoning scenarios that require simultaneous visual and physical measurements. Finally, we develop a new modular Closed Loop Interactive Reasoning (CLIER) approach that takes into account the measurements of non-visual object properties, changes in the scene caused by external disturbances as well as uncertain outcomes of robotic actions. We extensively evaluate our reasoning approach in simulation and in the real world manipulation tasks with a success rate above 76% and 64%, respectively.
Abstract:This work presents a framework for automatically extracting physical object properties, such as material composition, mass, volume, and stiffness, through robot manipulation and a database of object measurements. The framework involves exploratory action selection to maximize learning about objects on a table. A Bayesian network models conditional dependencies between object properties, incorporating prior probability distributions and uncertainty associated with measurement actions. The algorithm selects optimal exploratory actions based on expected information gain and updates object properties through Bayesian inference. Experimental evaluation demonstrates effective action selection compared to a baseline and correct termination of the experiments if there is nothing more to be learned. The algorithm proved to behave intelligently when presented with trick objects with material properties in conflict with their appearance. The robot pipeline integrates with a logging module and an online database of objects, containing over 24,000 measurements of 63 objects with different grippers. All code and data are publicly available, facilitating automatic digitization of objects and their physical properties through exploratory manipulations.
Abstract:Standard robot grippers are not designed for elasticity estimation. In this work, a professional biaxial compression device was used as a control setup to study the accuracy with which material properties can be estimated by two standard parallel jaw grippers and a force/torque sensor mounted at the robot wrist. Using three sets of deformable objects, different parameters were varied to observe their effect on measuring material characteristics: (1) repeated compression cycles, (2) compression speed, and (3) the surface area of the gripper jaws. Gripper effort versus position curves were obtained and transformed into stress/strain curves. The modulus of elasticity was estimated at different strain points. Viscoelasticity was assessed using the energy absorbed in a compression/decompression cycle, the Kelvin-Voigt, and Hunt-Crossley models. Our results can be summarized as follows: (1) better results were obtained with slower compression speeds, while additional compression cycles or surface area did not improve estimation; (2) the robot grippers, even after calibration, were found to have a limited capability of delivering accurate estimates of absolute values of Young's modulus and viscoelasticity; (3) relative ordering of material characteristics was largely consistent across different grippers; (4) despite the nonlinear characteristics of deformable objects, fitting linear stress/strain approximations led to more stable results than local estimates of Young's modulus; (5) to assess viscoelasticity, the Hunt-Crossley model worked best. Finally, we show that a two-dimensional space representing elasticity and viscoelasticity estimates is advantageous for the discrimination of deformable objects. A single-grasp, online, classification and sorting of such objects is thus possible. An additional contribution is the dataset and data processing codes that we make publicly available.
Abstract:For safe and effective operation of humanoid robots in human-populated environments, the problem of commanding a large number of Degrees of Freedom (DoF) while simultaneously considering dynamic obstacles and human proximity has still not been solved. We present a new reactive motion controller that commands two arms of a humanoid robot and three torso joints (17 DoF in total). We formulate a quadratic program that seeks joint velocity commands respecting multiple constraints while minimizing the magnitude of the velocities. We introduce a new unified treatment of obstacles that dynamically maps visual and proximity (pre-collision) and tactile (post-collision) obstacles as additional constraints to the motion controller, in a distributed fashion over surface of the upper-body of the iCub robot (with 2000 pressure-sensitive receptors). The bio-inspired controller: (i) produces human-like minimum jerk movement profiles; (ii) gives rise to a robot with whole-body visuo-tactile awareness, resembling peripersonal space representations. The controller was extensively experimentally validated, including a physical human-robot interaction scenario.
Abstract:For robot manipulation, a complete and accurate object shape is desirable. Here, we present a method that combines visual and haptic reconstruction in a closed-loop pipeline. From an initial viewpoint, the object shape is reconstructed using an implicit surface deep neural network. The location with highest uncertainty is selected for haptic exploration, the object is touched, the new information from touch and a new point cloud from the camera are added, object position is re-estimated and the cycle is repeated. We extend Rustler et al. (2022) by using a new theoretically grounded method to determine the points with highest uncertainty, and we increase the yield of every haptic exploration by adding not only the contact points to the point cloud but also incorporating the empty space established through the robot movement to the object. Additionally, the solution is compact in that the jaws of a closed two-finger gripper are directly used for exploration. The object position is re-estimated after every robot action and multiple objects can be present simultaneously on the table. We achieve a steady improvement with every touch using three different metrics and demonstrate the utility of the better shape reconstruction in grasping experiments on the real robot. On average, grasp success rate increases from 63.3% to 70.4% after a single exploratory touch and to 82.7% after five touches. The collected data and code are publicly available (https://osf.io/j6rkd/, https://github.com/ctu-vras/vishac)
Abstract:Two regimes permitting safe physical human-robot interaction, speed and separation monitoring and safety-rated monitored stop, depend on reliable perception of the space surrounding the robot. This can be accomplished by visual sensors (like cameras, RGB-D cameras, LIDARs), proximity sensors, or dedicated devices used in industrial settings like pads that are activated by the presence of the operator. The deployment of a particular solution is often ad hoc and no unified representation of the interaction space or its coverage by the different sensors exists. In this work, we make first steps in this direction by defining the spaces to be monitored, representing all sensor data as information about occupancy and using occupancy-based metrics to calculate how a particular sensor covers the workspace. We demonstrate our approach in two (multi-)sensor-placement experiments in three static scenes and one experiment in a dynamic scene. The occupancy representation allow to compare the effectiveness of various sensor setups. Therefore, this approach can serve as a prototyping tool to establish the sensor setup that provides the most efficient coverage for the given metrics and sensor representations.
Abstract:Humans and animals excel in combining information from multiple sensory modalities, controlling their complex bodies, adapting to growth, failures, or using tools. These capabilities are also highly desirable in robots. They are displayed by machines to some extent. Yet, the artificial creatures are lagging behind. The key foundation is an internal representation of the body that the agent - human, animal, or robot - has developed. The mechanisms of operation of body models in the brain are largely unknown and even less is known about how they are constructed from experience after birth. In collaboration with developmental psychologists, we conducted targeted experiments to understand how infants acquire first "sensorimotor body knowledge". These experiments inform our work in which we construct embodied computational models on humanoid robots that address the mechanisms behind learning, adaptation, and operation of multimodal body representations. At the same time, we assess which of the features of the "body in the brain" should be transferred to robots to give rise to more adaptive and resilient, self-calibrating machines. We extend traditional robot kinematic calibration focusing on self-contained approaches where no external metrology is needed: self-contact and self-observation. Problem formulation allowing to combine several ways of closing the kinematic chain simultaneously is presented, along with a calibration toolbox and experimental validation on several robot platforms. Finally, next to models of the body itself, we study peripersonal space - the space immediately surrounding the body. Again, embodied computational models are developed and subsequently, the possibility of turning these biologically inspired representations into safe human-robot collaboration is studied.