Abstract:Industry 4.0 has witnessed the rise of complex robots fueled by the integration of Artificial Intelligence/Machine Learning (AI/ML) and Digital Twin (DT) technologies. While these technologies offer numerous benefits, they also introduce potential privacy and security risks. This paper surveys privacy attacks targeting robots enabled by AI and DT models. Exfiltration and data leakage of ML models are discussed in addition to the potential extraction of models derived from first-principles (e.g., physics-based). We also discuss design considerations with DT-integrated robotics touching on the impact of ML model training, responsible AI and DT safeguards, data governance and ethical considerations on the effectiveness of these attacks. We advocate for a trusted autonomy approach, emphasizing the need to combine robotics, AI, and DT technologies with robust ethical frameworks and trustworthiness principles for secure and reliable AI robotic systems.
Abstract:Robotics and Artificial Intelligence (AI) have been inextricably intertwined since their inception. Today, AI-Robotics systems have become an integral part of our daily lives, from robotic vacuum cleaners to semi-autonomous cars. These systems are built upon three fundamental architectural elements: perception, navigation and planning, and control. However, while the integration of AI-Robotics systems has enhanced the quality our lives, it has also presented a serious problem - these systems are vulnerable to security attacks. The physical components, algorithms, and data that make up AI-Robotics systems can be exploited by malicious actors, potentially leading to dire consequences. Motivated by the need to address the security concerns in AI-Robotics systems, this paper presents a comprehensive survey and taxonomy across three dimensions: attack surfaces, ethical and legal concerns, and Human-Robot Interaction (HRI) security. Our goal is to provide users, developers and other stakeholders with a holistic understanding of these areas to enhance the overall AI-Robotics system security. We begin by surveying potential attack surfaces and provide mitigating defensive strategies. We then delve into ethical issues, such as dependency and psychological impact, as well as the legal concerns regarding accountability for these systems. Besides, emerging trends such as HRI are discussed, considering privacy, integrity, safety, trustworthiness, and explainability concerns. Finally, we present our vision for future research directions in this dynamic and promising field.
Abstract:A major challenge with off-road autonomous navigation is the lack of maps or road markings that can be used to plan a path for autonomous robots. Classical path planning methods mostly assume a perfectly known environment without accounting for the inherent perception and sensing uncertainty from detecting terrain and obstacles in off-road environments. Recent work in computer vision and deep neural networks has advanced the capability of terrain traversability segmentation from raw images; however, the feasibility of using these noisy segmentation maps for navigation and path planning has not been adequately explored. To address this problem, this research proposes an uncertainty-aware path planning method, URA* using aerial images for autonomous navigation in off-road environments. An ensemble convolutional neural network (CNN) model is first used to perform pixel-level traversability estimation from aerial images of the region of interest. The traversability predictions are represented as a grid of traversal probability values. An uncertainty-aware planner is then applied to compute the best path from a start point to a goal point given these noisy traversal probability estimates. The proposed planner also incorporates replanning techniques to allow rapid replanning during online robot operation. The proposed method is evaluated on the Massachusetts Road Dataset, the DeepGlobe dataset, as well as a dataset of aerial images from off-road proving grounds at Mississippi State University. Results show that the proposed image segmentation and planning methods outperform conventional planning algorithms in terms of the quality and feasibility of the initial path, as well as the quality of replanned paths.
Abstract:This paper investigates the impact of LiDAR configuration shifts on the performance of 3D LiDAR point cloud semantic segmentation models, a topic not extensively studied before. We explore the effect of using different LiDAR channels when training and testing a 3D LiDAR point cloud semantic segmentation model, utilizing Cylinder3D for the experiments. A Cylinder3D model is trained and tested on simulated 3D LiDAR point cloud datasets created using the Mississippi State University Autonomous Vehicle Simulator (MAVS) and 32, 64 channel 3D LiDAR point clouds of the RELLIS-3D dataset collected in a real-world off-road environment. Our experimental results demonstrate that sensor and spatial domain shifts significantly impact the performance of LiDAR-based semantic segmentation models. In the absence of spatial domain changes between training and testing, models trained and tested on the same sensor type generally exhibited better performance. Moreover, higher-resolution sensors showed improved performance compared to those with lower-resolution ones. However, results varied when spatial domain changes were present. In some cases, the advantage of a sensor's higher resolution led to better performance both with and without sensor domain shifts. In other instances, the higher resolution resulted in overfitting within a specific domain, causing a lack of generalization capability and decreased performance when tested on data with different sensor configurations.
Abstract:Building Information Modeling (BIM) technology is a key component of modern construction engineering and project management workflows. As-is BIM models that represent the spatial reality of a project site can offer crucial information to stakeholders for construction progress monitoring, error checking, and building maintenance purposes. Geometric methods for automatically converting raw scan data into BIM models (Scan-to-BIM) often fail to make use of higher-level semantic information in the data. Whereas, semantic segmentation methods only output labels at the point level without creating object level models that is necessary for BIM. To address these issues, this research proposes a hybrid semantic-geometric approach for clutter-resistant floorplan generation from laser-scanned building point clouds. The input point clouds are first pre-processed by normalizing the coordinate system and removing outliers. Then, a semantic segmentation network based on PointNet++ is used to label each point as ceiling, floor, wall, door, stair, and clutter. The clutter points are removed whereas the wall, door, and stair points are used for 2D floorplan generation. A region-growing segmentation algorithm paired with geometric reasoning rules is applied to group the points together into individual building elements. Finally, a 2-fold Random Sample Consensus (RANSAC) algorithm is applied to parameterize the building elements into 2D lines which are used to create the output floorplan. The proposed method is evaluated using the metrics of precision, recall, Intersection-over-Union (IOU), Betti error, and warping error.
Abstract:Robotics, automation, and related Artificial Intelligence (AI) systems have become pervasive bringing in concerns related to security, safety, accuracy, and trust. With growing dependency on physical robots that work in close proximity to humans, the security of these systems is becoming increasingly important to prevent cyber-attacks that could lead to privacy invasion, critical operations sabotage, and bodily harm. The current shortfall of professionals who can defend such systems demands development and integration of such a curriculum. This course description includes details about seven self-contained and adaptive modules on "AI security threats against pervasive robotic systems". Topics include: 1) Introduction, examples of attacks, and motivation; 2) - Robotic AI attack surfaces and penetration testing; 3) - Attack patterns and security strategies for input sensors; 4) - Training attacks and associated security strategies; 5) - Inference attacks and associated security strategies; 6) - Actuator attacks and associated security strategies; and 7) - Ethics of AI, robotics, and cybersecurity.
Abstract:Contrastive learning has recently demonstrated superior performance to supervised learning, despite requiring no training labels. We explore how contrastive learning can be applied to hundreds of thousands of unlabeled Mars terrain images, collected from the Mars rovers Curiosity and Perseverance, and from the Mars Reconnaissance Orbiter. Such methods are appealing since the vast majority of Mars images are unlabeled as manual annotation is labor intensive and requires extensive domain knowledge. Contrastive learning, however, assumes that any given pair of distinct images contain distinct semantic content. This is an issue for Mars image datasets, as any two pairs of Mars images are far more likely to be semantically similar due to the lack of visual diversity on the planet's surface. Making the assumption that pairs of images will be in visual contrast - when they are in fact not - results in pairs that are falsely considered as negatives, impacting training performance. In this study, we propose two approaches to resolve this: 1) an unsupervised deep clustering step on the Mars datasets, which identifies clusters of images containing similar semantic content and corrects false negative errors during training, and 2) a simple approach which mixes data from different domains to increase visual diversity of the total training dataset. Both cases reduce the rate of false negative pairs, thus minimizing the rate in which the model is incorrectly penalized during contrastive training. These modified approaches remain fully unsupervised end-to-end. To evaluate their performance, we add a single linear layer trained to generate class predictions based on these contrastively-learned features and demonstrate increased performance compared to supervised models; observing an improvement in classification accuracy of 3.06% using only 10% of the labeled data.
Abstract:Planetary rover missions must utilize machine learning-based perception to continue extra-terrestrial exploration with little to no human presence. Martian terrain segmentation has been critical for rover navigation and hazard avoidance to perform further exploratory tasks, e.g. soil sample collection and searching for organic compounds. Current Martian terrain segmentation models require a large amount of labeled data to achieve acceptable performance, and also require retraining for deployment across different domains, i.e. different rover missions, or different tasks, i.e. geological identification and navigation. This research proposes a semi-supervised learning approach that leverages unsupervised contrastive pretraining of a backbone for a multi-mission semantic segmentation for Martian surfaces. This model will expand upon the current Martian segmentation capabilities by being able to deploy across different Martian rover missions for terrain navigation, by utilizing a mixed-domain training set that ensures feature diversity. Evaluation results of using average pixel accuracy show that a semi-supervised mixed-domain approach improves accuracy compared to single domain training and supervised training by reaching an accuracy of 97% for the Mars Science Laboratory's Curiosity Rover and 79.6% for the Mars 2020 Perseverance Rover. Further, providing different weighting methods to loss functions improved the models correct predictions for minority or rare classes by over 30% using the recall metric compared to standard cross-entropy loss. These results can inform future multi-mission and multi-task semantic segmentation for rover missions in a data-efficient manner.
Abstract:Planetary rover systems need to perform terrain segmentation to identify drivable areas as well as identify specific types of soil for sample collection. The latest Martian terrain segmentation methods rely on supervised learning which is very data hungry and difficult to train where only a small number of labeled samples are available. Moreover, the semantic classes are defined differently for different applications (e.g., rover traversal vs. geological) and as a result the network has to be trained from scratch each time, which is an inefficient use of resources. This research proposes a semi-supervised learning framework for Mars terrain segmentation where a deep segmentation network trained in an unsupervised manner on unlabeled images is transferred to the task of terrain segmentation trained on few labeled images. The network incorporates a backbone module which is trained using a contrastive loss function and an output atrous convolution module which is trained using a pixel-wise cross-entropy loss function. Evaluation results using the metric of segmentation accuracy show that the proposed method with contrastive pretraining outperforms plain supervised learning by 2%-10%. Moreover, the proposed model is able to achieve a segmentation accuracy of 91.1% using only 161 training images (1% of the original dataset) compared to 81.9% with plain supervised learning.
Abstract:3D point cloud segmentation is an important function that helps robots understand the layout of their surrounding environment and perform tasks such as grasping objects, avoiding obstacles, and finding landmarks. Current segmentation methods are mostly class-specific, many of which are tuned to work with specific object categories and may not be generalizable to different types of scenes. This research proposes a learnable region growing method for class-agnostic point cloud segmentation, specifically for the task of instance label prediction. The proposed method is able to segment any class of objects using a single deep neural network without any assumptions about their shapes and sizes. The deep neural network is trained to predict how to add or remove points from a point cloud region to morph it into incrementally more complete regions of an object instance. Segmentation results on the S3DIS and ScanNet datasets show that the proposed method outperforms competing methods by 1%-9% on 6 different evaluation metrics.