Abstract:Effective processing, interpretation, and management of sensor data have emerged as a critical component of cyber-physical systems. Traditionally, processing sensor data requires profound theoretical knowledge and proficiency in signal-processing tools. However, recent works show that Large Language Models (LLMs) have promising capabilities in processing sensory data, suggesting their potential as copilots for developing sensing systems. To explore this potential, we construct a comprehensive benchmark, SensorBench, to establish a quantifiable objective. The benchmark incorporates diverse real-world sensor datasets for various tasks. The results show that while LLMs exhibit considerable proficiency in simpler tasks, they face inherent challenges in processing compositional tasks with parameter selections compared to engineering experts. Additionally, we investigate four prompting strategies for sensor processing and show that self-verification can outperform all other baselines in 48% of tasks. Our study provides a comprehensive benchmark and prompting analysis for future developments, paving the way toward an LLM-based sensor processing copilot.
Abstract:Nowadays, large-scale foundation models are being increasingly integrated into numerous safety-critical applications, including human-autonomy teaming (HAT) within transportation, medical, and defence domains. Consequently, the inherent 'black-box' nature of these sophisticated deep neural networks heightens the significance of fostering mutual understanding and trust between humans and autonomous systems. To tackle the transparency challenges in HAT, this paper conducts a thoughtful study on the underexplored domain of Explainable Interface (EI) in HAT systems from a human-centric perspective, thereby enriching the existing body of research in Explainable Artificial Intelligence (XAI). We explore the design, development, and evaluation of EI within XAI-enhanced HAT systems. To do so, we first clarify the distinctions between these concepts: EI, explanations and model explainability, aiming to provide researchers and practitioners with a structured understanding. Second, we contribute to a novel framework for EI, addressing the unique challenges in HAT. Last, our summarized evaluation framework for ongoing EI offers a holistic perspective, encompassing model performance, human-centered factors, and group task objectives. Based on extensive surveys across XAI, HAT, psychology, and Human-Computer Interaction (HCI), this review offers multiple novel insights into incorporating XAI into HAT systems and outlines future directions.
Abstract:This study introduces an innovative violence detection framework tailored to the unique requirements of smart airports, where prompt responses to violent situations are crucial. The proposed framework harnesses the power of ViTPose for human pose estimation. It employs a CNN - BiLSTM network to analyse spatial and temporal information within keypoints sequences, enabling the accurate classification of violent behaviour in real time. Seamlessly integrated within the SAFE (Situational Awareness for Enhanced Security framework of SAAB, the solution underwent integrated testing to ensure robust performance in real world scenarios. The AIRTLab dataset, characterized by its high video quality and relevance to surveillance scenarios, is utilized in this study to enhance the model's accuracy and mitigate false positives. As airports face increased foot traffic in the post pandemic era, implementing AI driven violence detection systems, such as the one proposed, is paramount for improving security, expediting response times, and promoting data informed decision making. The implementation of this framework not only diminishes the probability of violent events but also assists surveillance teams in effectively addressing potential threats, ultimately fostering a more secure and protected aviation sector. Codes are available at: https://github.com/Asami-1/GDP.
Abstract:Interest in autonomous driving (AD) and intelligent vehicles (IVs) is growing at a rapid pace due to the convenience, safety, and economic benefits. Although a number of surveys have reviewed research achievements in this field, they are still limited in specific tasks and lack systematic summaries and research directions in the future. Our work is divided into 3 independent articles and the first part is a Survey of Surveys (SoS) for total technologies of AD and IVs that involves the history, summarizes the milestones, and provides the perspectives, ethics, and future research directions. This is the second part (Part \uppercase\expandafter{\romannumeral1} for this technical survey) to review the development of control, computing system design, communication, High Definition map (HD map), testing, and human behaviors in IVs. In addition, the third part (Part \uppercase\expandafter{\romannumeral2} for this technical survey) is to review the perception and planning sections. The objective of this paper is to involve all the sections of AD, summarize the latest technical milestones, and guide abecedarians to quickly understand the development of AD and IVs. Combining the SoS and Part \uppercase\expandafter{\romannumeral2}, we anticipate that this work will bring novel and diverse insights to researchers and abecedarians, and serve as a bridge between past and future.
Abstract:Interest in autonomous driving (AD) and intelligent vehicles (IVs) is growing at a rapid pace due to the convenience, safety, and economic benefits. Although a number of surveys have reviewed research achievements in this field, they are still limited in specific tasks, lack of systematic summary and research directions in the future. Here we propose a Survey of Surveys (SoS) for total technologies of AD and IVs that reviews the history, summarizes the milestones, and provides the perspectives, ethics, and future research directions. To our knowledge, this article is the first SoS with milestones in AD and IVs, which constitutes our complete research work together with two other technical surveys. We anticipate that this article will bring novel and diverse insights to researchers and abecedarians, and serve as a bridge between past and future.
Abstract:Future airports are becoming more complex and congested with the increasing number of travellers. While the airports are more likely to become hotspots for potential conflicts to break out which can cause serious delays to flights and several safety issues. An intelligent algorithm which renders security surveillance more effective in detecting conflicts would bring many benefits to the passengers in terms of their safety, finance, and travelling efficiency. This paper details the development of a machine learning model to classify conflicting behaviour in a crowd. HRNet is used to segment the images and then two approaches are taken to classify the poses of people in the frame via multiple classifiers. Among them, it was found that the support vector machine (SVM) achieved the most performant achieving precision of 94.37%. Where the model falls short is against ambiguous behaviour such as a hug or losing track of a subject in the frame. The resulting model has potential for deployment within an airport if improvements are made to cope with the vast number of potential passengers in view as well as training against further ambiguous behaviours which will arise in an airport setting. In turn, will provide the capability to enhance security surveillance and improve airport safety.
Abstract:Integrating trajectory prediction to the decision-making and planning modules of modular autonomous driving systems is expected to improve the safety and efficiency of self-driving vehicles. However, a vehicle's future trajectory prediction is a challenging task since it is affected by the social interactive behaviors of neighboring vehicles, and the number of neighboring vehicles can vary in different situations. This work proposes a GNN-RNN based Encoder-Decoder network for interaction-aware trajectory prediction, where vehicles' dynamics features are extracted from their historical tracks using RNN, and the inter-vehicular interaction is represented by a directed graph and encoded using a GNN. The parallelism of GNN implies the proposed method's potential to predict multi-vehicular trajectories simultaneously. Evaluation on the dataset extracted from the NGSIM US-101 dataset shows that the proposed model is able to predict a target vehicle's trajectory in situations with a variable number of surrounding vehicles.
Abstract:Simultaneous trajectory prediction for multiple heterogeneous traffic participants is essential for the safe and efficient operation of connected automated vehicles under complex driving situations in the real world. The multi-agent prediction task is challenging, as the motions of traffic participants are affected by many factors, including their individual dynamics, their interactions with surrounding agents, the traffic infrastructures, and the number and modalities of the target agents. To further advance the trajectory prediction techniques, in this work we propose a three-channel framework together with a novel Heterogeneous Edge-enhanced graph ATtention network (HEAT), which is able to deal with the heterogeneity of the target agents and traffic participants involved. Specifically, the agent's dynamics are extracted from their historical states using type-specific encoders. The inter-agent interactions are represented with a directed edge-featured heterogeneous graph, and then interaction features are extracted using the proposed HEAT network. Besides, the map features are shared across all agents by introducing a selective gate mechanism. And finally, the trajectories of multi-agent are executed simultaneously. Validations using both urban and highway driving datasets show that the proposed model can realize simultaneous trajectory predictions for multiple agents under complex traffic situations, and achieve state-of-the-art performance with respect to prediction accuracy, demonstrating its feasibility and effectiveness.
Abstract:Due to the limited smartness and abilities of machine intelligence, currently autonomous vehicles are still unable to handle all kinds of situations and completely replace drivers. Because humans exhibit strong robustness and adaptability in complex driving scenarios, it is of great importance to introduce humans into the training loop of artificial intelligence, leveraging human intelligence to further advance machine learning algorithms. In this study, a real-time human-guidance-based deep reinforcement learning (Hug-DRL) method is developed for policy training of autonomous driving. Leveraging a newly designed control transfer mechanism between human and automation, human is able to intervene and correct the agent's unreasonable actions in real time when necessary during the model training process. Based on this human-in-the-loop guidance mechanism, an improved actor-critic architecture with modified policy and value networks is developed. The fast convergence of the proposed Hug-DRL allows real-time human guidance actions to be fused into the agent's training loop, further improving the efficiency and performance of deep reinforcement learning. The developed method is validated by human-in-the-loop experiments with 40 subjects and compared with other state-of-the-art learning approaches. The results suggest that the proposed method can effectively enhance the training efficiency and performance of the deep reinforcement learning algorithm under human guidance, without imposing specific requirements on participant expertise and experience.
Abstract:Predicting the future trajectory of surrounding vehicles is essential for the navigation of autonomous vehicles in complex real-world driving scenarios. It is challenging as a vehicle's motion is affected by many factors, including its surrounding infrastructures and vehicles. In this work, we develop the ReCoG (Recurrent Convolutional and Graph Neural Networks), which is a general scheme that represents vehicle interactions with infrastructure information as a heterogeneous graph and applies graph neural networks (GNNs) to model the high-level interactions for trajectory prediction. Nodes in the graph contain corresponding features, where a vehicle node contains its sequential feature encoded using Recurrent Neural Network (RNN), and an infrastructure node contains spatial feature encoded using Convolutional Neural Network (CNN). Then the ReCoG predicts the future trajectory of the target vehicle by jointly considering all of the features. Experiments are conducted by using the INTERACTION dataset. Experimental results show that the proposed ReCoG outperforms other state-of-the-art methods in terms of different types of displacement error, validating the feasibility and effectiveness of the developed approach.