Abstract:Behavior cloning (BC) is a popular supervised imitation learning method in the societies of robotics, autonomous driving, etc., wherein complex skills can be learned by direct imitation from expert demonstrations. Despite its rapid development, it is still affected by limited field of view where accumulation of sensors and joint noise bring compounding errors. In this paper, we introduced geometrically and historically constrained behavior cloning (GHCBC) to dominantly consider high-level state information inspired by neuroscientists, wherein the geometrically constrained behavior cloning were used to geometrically constrain predicting poses, and the historically constrained behavior cloning were utilized to temporally constrain action sequences. The synergy between these two types of constrains enhanced the BC performance in terms of robustness and stability. Comprehensive experimental results showed that success rates were improved by 29.73% in simulation and 39.4% in real robot experiments in average, respectively, compared to state-of-the-art BC method, especially in long-term operational scenes, indicating great potential of using the GHCBC for robotic learning.
Abstract:Industrial managements, including quality control, cost and safety optimization, etc., heavily rely on high quality industrial human action recognitions (IHARs) which were hard to be implemented in large-scale industrial scenes due to their high costs and poor real-time performance. In this paper, we proposed a large-scale foundation model(LSFM)-based IHAR method, wherein various LSFMs and lightweight methods were jointly used, for the first time, to fulfill low-cost dataset establishment and real-time IHARs. Comprehensive tests on in-situ large-scale industrial manufacturing lines elucidated that the proposed method realized great reduction on employment costs, superior real-time performance, and satisfactory accuracy and generalization capabilities, indicating its great potential as a backbone IHAR method, especially for large-scale industrial applications.
Abstract:Remarkable successes were made in Medical Image Classification (MIC) recently, mainly due to wide applications of convolutional neural networks (CNNs). However, adversarial examples (AEs) exhibited imperceptible similarity with raw data, raising serious concerns on network robustness. Although adversarial training (AT), in responding to malevolent AEs, was recognized as an effective approach to improve robustness, it was challenging to overcome generalization decline of networks caused by the AT. In this paper, in order to reserve high generalization while improving robustness, we proposed a dynamic perturbation-adaptive adversarial training (DPAAT) method, which placed AT in a dynamic learning environment to generate adaptive data-level perturbations and provided a dynamically updated criterion by loss information collections to handle the disadvantage of fixed perturbation sizes in conventional AT methods and the dependence on external transference. Comprehensive testing on dermatology HAM10000 dataset showed that the DPAAT not only achieved better robustness improvement and generalization preservation but also significantly enhanced mean average precision and interpretability on various CNNs, indicating its great potential as a generic adversarial training method on the MIC.
Abstract:Deep neural networks were significantly vulnerable to adversarial examples manipulated by malicious tiny perturbations. Although most conventional adversarial attacks ensured the visual imperceptibility between adversarial examples and corresponding raw images by minimizing their geometric distance, these constraints on geometric distance led to limited attack transferability, inferior visual quality, and human-imperceptible interpretability. In this paper, we proposed a supervised semantic-transformation generative model to generate adversarial examples with real and legitimate semantics, wherein an unrestricted adversarial manifold containing continuous semantic variations was constructed for the first time to realize a legitimate transition from non-adversarial examples to adversarial ones. Comprehensive experiments on MNIST and industrial defect datasets showed that our adversarial examples not only exhibited better visual quality but also achieved superior attack transferability and more effective explanations for model vulnerabilities, indicating their great potential as generic adversarial examples. The code and pre-trained models were available at https://github.com/shuaili1027/MAELS.git.
Abstract:Stereo visual odometry is widely used where a robot tracks its position and orientation using stereo cameras. Most of the approaches recovered mobile robotics motion based on the matching and tracking of point features along a sequence of stereo images. But in low-textured and dynamic scenes, there are no sufficient robust static point features for motion estimation, causing lots of previous work to fail to reconstruct the robotic motion. However, line features can be detected in such low-textured and dynamic scenes. In this paper, we proposed DynPL-SVO, a stereo visual odometry with the $dynamic$ $grid$ algorithm and the cost function containing both vertical and horizontal information of the line features. Stereo camera motion was obtained through Levenberg-Marquard minimization of re-projection error of point and line features. The experimental results on the KITTI and EuRoC MAV datasets showed that the DynPL-SVO had a competitive performance when compared to other state-of-the-art systems by producing more robust and accurate motion estimation, especially in low-textured and dynamic scenes.