Abstract:Developing an effective evaluation metric is crucial for accurately and swiftly measuring LiDAR perception performance. One major issue is the lack of metrics that can simultaneously generate fast and accurate evaluations based on either object detection or point cloud data. In this study, we propose a novel LiDAR perception entropy metric based on the probability of vehicle grid occupancy. This metric reflects the influence of point cloud distribution on vehicle detection performance. Based on this, we also introduce a LiDAR deployment optimization model, which is solved using a differential evolution-based particle swarm optimization algorithm. A comparative experiment demonstrated that the proposed PE-VGOP offers a correlation of more than 0.98 with vehicle detection ground truth in evaluating LiDAR perception performance. Furthermore, compared to the base deployment, field experiments indicate that the proposed optimization model can significantly enhance the perception capabilities of various types of LiDARs, including RS-16, RS-32, and RS-80. Notably, it achieves a 25% increase in detection Recall for the RS-32 LiDAR.
Abstract:Background: Colonoscopy, a crucial diagnostic tool in gastroenterology, depends heavily on superior bowel preparation. ChatGPT, a large language model with emergent intelligence which also exhibits potential in medical applications. This study aims to assess the accuracy and consistency of ChatGPT in using the Boston Bowel Preparation Scale (BBPS) for colonoscopy assessment. Methods: We retrospectively collected 233 colonoscopy images from 2020 to 2023. These images were evaluated using the BBPS by 3 senior endoscopists and 3 novice endoscopists. Additionally, ChatGPT also assessed these images, having been divided into three groups and undergone specific Fine-tuning. Consistency was evaluated through two rounds of testing. Results: In the initial round, ChatGPT's accuracy varied between 48.93% and 62.66%, trailing the endoscopists' accuracy of 76.68% to 77.83%. Kappa values for ChatGPT was between 0.52 and 0.53, compared to 0.75 to 0.87 for the endoscopists. Conclusion: While ChatGPT shows promise in bowel preparation scoring, it currently does not match the accuracy and consistency of experienced endoscopists. Future research should focus on in-depth Fine-tuning.
Abstract:Robots play a critical role as the physical agent of human operators in exploring the ocean. However, it remains challenging to grasp objects reliably while fully submerging under a highly pressurized aquatic environment with little visible light, mainly due to the fluidic interference on the tactile mechanics between the finger and object surfaces. This study investigates the transferability of grasping knowledge from on-land to underwater via a vision-based soft robotic finger that learns 6D forces and torques (FT) using a Supervised Variational Autoencoder (SVAE). A high-framerate camera captures the whole-body deformations while a soft robotic finger interacts with physical objects on-land and underwater. Results show that the trained SVAE model learned a series of latent representations of the soft mechanics transferrable from land to water, presenting a superior adaptation to the changing environments against commercial FT sensors. Soft, delicate, and reactive grasping enabled by tactile intelligence enhances the gripper's underwater interaction with improved reliability and robustness at a much-reduced cost, paving the path for learning-based intelligent grasping to support fundamental scientific discoveries in environmental and ocean research.
Abstract:Proprioception is the "sixth sense" that detects limb postures with motor neurons. It requires a natural integration between the musculoskeletal systems and sensory receptors, which is challenging among modern robots that aim for lightweight, adaptive, and sensitive designs at a low cost. Here, we present the Soft Polyhedral Network with an embedded vision for physical interactions, capable of adaptive kinesthesia and viscoelastic proprioception by learning kinetic features. This design enables passive adaptations to omni-directional interactions, visually captured by a miniature high-speed motion tracking system embedded inside for proprioceptive learning. The results show that the soft network can infer real-time 6D forces and torques with accuracies of 0.25/0.24/0.35 N and 0.025/0.034/0.006 Nm in dynamic interactions. We also incorporate viscoelasticity in proprioception during static adaptation by adding a creep and relaxation modifier to refine the predicted results. The proposed soft network combines simplicity in design, omni-adaptation, and proprioceptive sensing with high accuracy, making it a versatile solution for robotics at a low cost with more than 1 million use cycles for tasks such as sensitive and competitive grasping, and touch-based geometry reconstruction. This study offers new insights into vision-based proprioception for soft robots in adaptive grasping, soft manipulation, and human-robot interaction.
Abstract:Navigating automated driving systems (ADSs) through complex driving environments is difficult. Predicting the driving behavior of surrounding human-driven vehicles (HDVs) is a critical component of an ADS. This paper proposes an enhanced motion-planning approach for an ADS in a highway-merging scenario. The proposed enhanced approach utilizes the results of two aspects: the driving behavior and long-term trajectory of surrounding HDVs, which are coupled using a hierarchical model that is used for the motion planning of an ADS to improve driving safety.
Abstract:Benchmarking provides experimental evidence of the scientific baseline to enhance the progression of fundamental research, which is also applicable to robotics. In this paper, we propose a method to benchmark metrics of robotic manipulation, which addresses the spatial-temporal reasoning skills for robot learning with the jigsaw game. In particular, our approach exploits a simple set of jigsaw pieces by designing a structured protocol, which can be highly customizable according to a wide range of task specifications. Researchers can selectively adopt the proposed protocol to benchmark their research outputs, on a comparable scale in the functional, task, and system-level of details. The purpose is to provide a potential look-up table for learning-based robot manipulation, commonly available in other engineering disciplines, to facilitate the adoption of robotics through calculated, empirical, and systematic experimental evidence.
Abstract:Sophisticated traffic analytics, such as the encrypted traffic analytics and unknown malware detection, emphasizes the need for advanced methods to analyze the network traffic. Traditional methods of using fixed patterns, signature matching, and rules to detect known patterns in network traffic are being replaced with AI (Artificial Intelligence) driven algorithms. However, the absence of a high-performance AI networking-specific framework makes deploying real-time AI-based processing within networking workloads impossible. In this paper, we describe the design of Traffic Analytics Development Kits (TADK), an industry-standard framework specific for AI-based networking workloads processing. TADK can provide real-time AI-based networking workload processing in networking equipment from the data center out to the edge without the need for specialized hardware (e.g., GPUs, Neural Processing Unit, and so on). We have deployed TADK in commodity WAF and 5G UPF, and the evaluation result shows that TADK can achieve a throughput up to 35.3Gbps per core on traffic feature extraction, 6.5Gbps per core on traffic classification, and can decrease SQLi/XSS detection down to 4.5us per request with higher accuracy than fixed pattern solution.
Abstract:Clustering of hyperspectral images is a fundamental but challenging task. The recent development of hyperspectral image clustering has evolved from shallow models to deep and achieved promising results in many benchmark datasets. However, their poor scalability, robustness, and generalization ability, mainly resulting from their offline clustering scenarios, greatly limit their application to large-scale hyperspectral data. To circumvent these problems, we present a scalable deep online clustering model, named Spectral-Spatial Contrastive Clustering (SSCC), based on self-supervised learning. Specifically, we exploit a symmetric twin neural network comprised of a projection head with a dimensionality of the cluster number to conduct dual contrastive learning from a spectral-spatial augmentation pool. We define the objective function by implicitly encouraging within-cluster similarity and reducing between-cluster redundancy. The resulting approach is trained in an end-to-end fashion by batch-wise optimization, making it robust in large-scale data and resulting in good generalization ability for unseen data. Extensive experiments on three hyperspectral image benchmarks demonstrate the effectiveness of our approach and show that we advance the state-of-the-art approaches by large margins.
Abstract:This paper presents FLGC, a simple yet effective fully linear graph convolutional network for semi-supervised and unsupervised learning. Instead of using gradient descent, we train FLGC based on computing a global optimal closed-form solution with a decoupled procedure, resulting in a generalized linear framework and making it easier to implement, train, and apply. We show that (1) FLGC is powerful to deal with both graph-structured data and regular data, (2) training graph convolutional models with closed-form solutions improve computational efficiency without degrading performance, and (3) FLGC acts as a natural generalization of classic linear models in the non-Euclidean domain, e.g., ridge regression and subspace clustering. Furthermore, we implement a semi-supervised FLGC and an unsupervised FLGC by introducing an initial residual strategy, enabling FLGC to aggregate long-range neighborhoods and alleviate over-smoothing. We compare our semi-supervised and unsupervised FLGCs against many state-of-the-art methods on a variety of classification and clustering benchmarks, demonstrating that the proposed FLGC models consistently outperform previous methods in terms of accuracy, robustness, and learning efficiency. The core code of our FLGC is released at https://github.com/AngryCai/FLGC.
Abstract:Objectives: Functional connectivity triggered by naturalistic stimulus (e.g., movies) and machine learning techniques provide a great insight in exploring the brain functions such as fluid intelligence. However, functional connectivity are considered to be multi-layered, while traditional machine learning based on individual models not only are limited in performance, but also fail to extract multi-dimensional and multi-layered information from brain network. Methods: In this study, inspired by multi-layer brain network structure, we propose a new method namely Weighted Ensemble-model and Network Analysis, which combines the machine learning and graph theory for improved fluid intelligence prediction. Firstly, functional connectivity analysis and graphical theory were jointly employed. The functional connectivity and graphical indices computed using the preprocessed fMRI data were then all fed into auto-encoder parallelly for feature extraction to predict the fluid intelligence. In order to improve the performance, tree regression and ridge regression model were automatically stacked and fused with weighted values. Finally, layers of auto-encoder were visualized to better illustrate the connectome patterns, followed by the evaluation of the performance to justify the mechanism of brain functions. Results: Our proposed methods achieved best performance with 3.85 mean absolute deviation, 0.66 correlation coefficient and 0.42 R-squared coefficient, outperformed other state-of-the-art methods. It is also worth noting that, the optimization of the biological pattern extraction was automated though the auto-encoder algorithm. Conclusion: The proposed method not only outperforming the state-of-the-art reports, but also able to effectively capturing the biological patterns from functional connectivity during naturalistic movies state for potential clinical explorations.