Abstract:Robotic manipulation has made significant advancements, with systems demonstrating high precision and repeatability. However, this remarkable precision often fails to translate into efficient manipulation of thin deformable objects. Current robotic systems lack imprecise dexterity, the ability to perform dexterous manipulation through robust and adaptive behaviors that do not rely on precise control. This paper explores the singulation and grasping of thin, deformable objects. Here, we propose a novel solution that incorporates passive compliance, touch, and proprioception into thin, deformable object manipulation. Our system employs a soft, underactuated hand that provides passive compliance, facilitating adaptive and gentle interactions to dexterously manipulate deformable objects without requiring precise control. The tactile and force/torque sensors equipped on the hand, along with a depth camera, gather sensory data required for manipulation via the proposed slip module. The manipulation policies are learned directly from raw sensory data via model-free reinforcement learning, bypassing explicit environmental and object modeling. We implement a hierarchical double-loop learning process to enhance learning efficiency by decoupling the action space. Our method was deployed on real-world robots and trained in a self-supervised manner. The resulting policy was tested on a variety of challenging tasks that were beyond the capabilities of prior studies, ranging from displaying suit fabric like a salesperson to turning pages of sheet music for violinists.
Abstract:As automation technologies advance, the need for compact and multi-modal sensors in robotic applications is growing. To address this demand, we introduce CompdVision, a novel sensor that combines near-field 3D visual and tactile sensing. This sensor, with dimensions of 22$\times$14$\times$14 mm, leverages the compound eye imaging system to achieve a compact form factor without compromising its dual modalities. CompdVision utilizes two types of vision units to meet diverse sensing requirements. Stereo units with far-focus lenses can see through the transparent elastomer, facilitating depth estimation beyond the contact surface, while tactile units with near-focus lenses track the movement of markers embedded in the elastomer to obtain contact deformation. Experimental results validate the sensor's superior performance in 3D visual and tactile sensing. The sensor demonstrates effective depth estimation within a 70mm range from its surface. Additionally, it registers high accuracy in tangential and normal force measurements. The dual modalities and compact design make the sensor a versatile tool for complex robotic tasks.
Abstract:Numerous soft actuators based on PneuNet design have already been proposed and extensively employed across various soft robotics applications in recent years. Despite their widespread use, a common limitation of most existing designs is that their action is pre-determined during the fabrication process, thereby restricting the ability to modify or alter their function during operation. To address this shortcoming, in this article the design of a Reconfigurable, Transformable Soft Pneumatic Actuator (RT-SPA) is proposed. The working principle of the RT-SPA is analogous to the conventional PneuNet. The key distinction between the two lies in the ability of the RT-SPA to undergo controlled transformations, allowing for more versatile bending and twisting motions in various directions. Furthermore, the unique reconfigurable design of the RT-SPA enables the selection of actuation units with different sizes to achieve a diverse range of three-dimensional deformations. This versatility enhances the potential of the RT-SPA for adaptation to a multitude of tasks and environments, setting it apart from traditional PneuNet. The paper begins with a detailed description of the design and fabrication of the RT-SPA. Following this, a series of experiments are conducted to evaluate the performance of the RT-SPA. Finally, the abilities of the RT-SPA for locomotion, gripping, and object manipulation are demonstrated to illustrate the versatility of the RT-SPA across different aspects.
Abstract:Fire is one of the common disasters in daily life. To achieve fast and accurate detection of fires, this paper proposes a detection network called FSDNet (Fire Smoke Detection Network), which consists of a feature extraction module, a fire classification module, and a fire detection module. Firstly, a dense connection structure is introduced in the basic feature extraction module to enhance the feature extraction ability of the backbone network and alleviate the gradient disappearance problem. Secondly, a spatial pyramid pooling structure is introduced in the fire detection module, and the Mosaic data augmentation method and CIoU loss function are used in the training process to comprehensively improve the flame feature extraction ability. Finally, in view of the shortcomings of public fire datasets, a fire dataset called MS-FS (Multi-scene Fire And Smoke) containing 11938 fire images was created through data collection, screening, and object annotation. To prove the effectiveness of the proposed method, the accuracy of the method was evaluated on two benchmark fire datasets and MS-FS. The experimental results show that the accuracy of FSDNet on the two benchmark datasets is 99.82% and 91.15%, respectively, and the average precision on MS-FS is 86.80%, which is better than the mainstream fire detection methods.
Abstract:Humans excel in grasping objects through diverse and robust policies, many of which are so probabilistically rare that exploration-based learning methods hardly observe and learn. Inspired by the human learning process, we propose a method to extract and exploit latent intents from demonstrations, and then learn diverse and robust grasping policies through self-exploration. The resulting policy can grasp challenging objects in various environments with an off-the-shelf parallel gripper. The key component is a learned intention estimator, which maps gripper pose and visual sensory to a set of sub-intents covering important phases of the grasping movement. Sub-intents can be used to build an intrinsic reward to guide policy learning. The learned policy demonstrates remarkable zero-shot generalization from simulation to the real world while retaining its robustness against states that have never been encountered during training, novel objects such as protractors and user manuals, and environments such as the cluttered conveyor.
Abstract:This letter introduces ERRA, an embodied learning architecture that enables robots to jointly obtain three fundamental capabilities (reasoning, planning, and interaction) for solving long-horizon language-conditioned manipulation tasks. ERRA is based on tightly-coupled probabilistic inferences at two granularity levels. Coarse-resolution inference is formulated as sequence generation through a large language model, which infers action language from natural language instruction and environment state. The robot then zooms to the fine-resolution inference part to perform the concrete action corresponding to the action language. Fine-resolution inference is constructed as a Markov decision process, which takes action language and environmental sensing as observations and outputs the action. The results of action execution in environments provide feedback for subsequent coarse-resolution reasoning. Such coarse-to-fine inference allows the robot to decompose and achieve long-horizon tasks interactively. In extensive experiments, we show that ERRA can complete various long-horizon manipulation tasks specified by abstract language instructions. We also demonstrate successful generalization to the novel but similar natural language instructions.
Abstract:This paper tackles the task of singulating and grasping paper-like deformable objects. We refer to such tasks as paper-flipping. In contrast to manipulating deformable objects that lack compression strength (such as shirts and ropes), minor variations in the physical properties of the paper-like deformable objects significantly impact the results, making manipulation highly challenging. Here, we present Flipbot, a novel solution for flipping paper-like deformable objects. Flipbot allows the robot to capture object physical properties by integrating exteroceptive and proprioceptive perceptions that are indispensable for manipulating deformable objects. Furthermore, by incorporating a proposed coarse-to-fine exploration process, the system is capable of learning the optimal control parameters for effective paper-flipping through proprioceptive and exteroceptive inputs. We deploy our method on a real-world robot with a soft gripper and learn in a self-supervised manner. The resulting policy demonstrates the effectiveness of Flipbot on paper-flipping tasks with various settings beyond the reach of prior studies, including but not limited to flipping pages throughout a book and emptying paper sheets in a box.
Abstract:Machine Learning (ML) interatomic models and potentials have been widely employed in simulations of materials. Long-range interactions often dominate in some ionic systems whose dynamics behavior is significantly influenced. However, the long-range effect such as Coulomb and Van der Wales potential is not considered in most ML interatomic potentials. To address this issue, we put forward a method that can take long-range effects into account for most ML local interatomic models with the reciprocal space neural network. The structure information in real space is firstly transformed into reciprocal space and then encoded into a reciprocal space potential or a global descriptor with full atomic interactions. The reciprocal space potential and descriptor keep full invariance of Euclidean symmetry and choice of the cell. Benefiting from the reciprocal-space information, ML interatomic models can be extended to describe the long-range potential including not only Coulomb but any other long-range interaction. A model NaCl system considering Coulomb interaction and the GaxNy system with defects are applied to illustrate the advantage of our approach. At the same time, our approach helps to improve the prediction accuracy of some global properties such as the band gap where the full atomic interaction beyond local atomic environments plays a very important role. In summary, our work has expanded the ability of current ML interatomic models and potentials when dealing with the long-range effect, hence paving a new way for accurate prediction of global properties and large-scale dynamic simulations of systems with defects.
Abstract:This work presents Time-reversal Equivariant Neural Network (TENN) framework. With TENN, the time-reversal symmetry is considered in the equivariant neural network (ENN), which generalizes the ENN to consider physical quantities related to time-reversal symmetry such as spin and velocity of atoms. TENN-e3, as the time-reversal-extension of E(3) equivariant neural network, is developed to keep the Time-reversal E(3) equivariant with consideration of whether to include the spin-orbit effect for both collinear and non-collinear magnetic moments situations for magnetic material. TENN-e3 can construct spin neural network potential and the Hamiltonian of magnetic material from ab-initio calculations. Time-reversal-E(3)-equivariant convolutions for interactions of spinor and geometric tensors are employed in TENN-e3. Compared to the popular ENN, TENN-e3 can describe the complex spin-lattice coupling with high accuracy and keep time-reversal symmetry which is not preserved in the existing E(3)-equivariant model. Also, the Hamiltonian of magnetic material with time-reversal symmetry can be built with TENN-e3. TENN paves a new way to spin-lattice dynamics simulations over long-time scales and electronic structure calculations of large-scale magnetic materials.
Abstract:Machine learning, especially deep learning, can build a direct mapping from structure to properties with its huge parameter space, making it possible to perform high-throughput screening for the desired properties of materials. However, since the electronic Hamiltonian transforms non-trivially under rotation operations, it is challenging to accurately predict the electronic Hamiltonian while strictly satisfying this constraint. There is currently a lack of transferable machine learning models that can bypass the computationally demanding density functional theory (DFT) to obtain the ab initio Hamiltonian of molecules and materials by complete data-driven methods. In this work, we point out the necessity of explicitly considering the parity symmetry of the electronic Hamiltonian in addition to rotational equivariance. We propose a parameterized Hamiltonian that strictly satisfies rotational equivariance and parity symmetry simultaneously, based on which we develop an E(3) equivariant neural network called HamNet to predict the ab initio tight-binding Hamiltonian of various molecules and solids. The tests show that this model has similar transferability to that of machine learning potentials and can be applied to a class of materials with different configurations using the same set of trained network weights. The proposed framework provides a general transferable model for accelerating electronic structure calculations.