Abstract:The application of Multiple Unmanned Aerial Vehicles (Multi-UAV) in Wilderness Search and Rescue (WiSAR) significantly enhances mission success due to their rapid coverage of search areas from high altitudes and their adaptability to complex terrains. This capability is particularly crucial because time is a critical factor in searching for a lost person in the wilderness; as time passes, survival rates decrease and the search area expands. The probability of success in such searches can be further improved if UAVs leverage terrain features to predict the lost person's position. In this paper, we aim to enhance search missions by proposing a smart agent-based probability model that combines Monte Carlo simulations with an agent strategy list, mimicking the behavior of a lost person in the wildness areas. Furthermore, we develop a distributed Multi-UAV receding horizon search strategy with dynamic partitioning, utilizing the generated probability density model as prior information to prioritize locations where the lost person is most likely to be found. Simulated search experiments across different terrains have been conducted to validate the search efficiency of the proposed methods compared to other benchmark methods.
Abstract:This paper proposes a comprehensive hierarchical control framework for autonomous decision-making arising in robotics and autonomous systems. In a typical hierarchical control architecture, high-level decision making is often characterised by discrete state and decision/control sets. However, a rational decision is usually affected by not only the discrete states of the autonomous system, but also the underlying continuous dynamics even the evolution of its operational environment. This paper proposes a holistic and comprehensive design process and framework for this type of challenging problems, from new modelling and design problem formulation to control design and stability analysis. It addresses the intricate interplay between traditional continuous systems dynamics utilized at the low levels for control design and discrete Markov decision processes (MDP) for facilitating high-level decision making. We model the decision making system in complex environments as a hybrid system consisting of a controlled MDP and autonomous (i.e. uncontrolled) continuous dynamics. Consequently, the new formulation is called as hybrid Markov decision process (HMDP). The design problem is formulated with a focus on ensuring both safety and optimality while taking into account the influence of both the discrete and continuous state variables of different levels. With the help of the model predictive control (MPC) concept, a decision maker design scheme is proposed for the proposed hybrid decision making model. By carefully designing key ingredients involved in this scheme, it is shown that the recursive feasibility and stability of the proposed autonomous decision making scheme are guaranteed. The proposed framework is applied to develop an autonomous lane changing system for intelligent vehicles.
Abstract:At the Dialogue Robot Competition 2023 (DRC2023), which was held to improve the capability of dialogue robots, our team developed a system that could build common ground and take more natural turns based on user utterance texts. Our system generated queries for sightseeing spot searches using the common ground and engaged in dialogue while waiting for user comprehension.
Abstract:The ability to understand spatial-temporal patterns for crowds of people is crucial for achieving long-term autonomy of mobile robots deployed in human environments. However, traditional historical data-driven memory models are inadequate for handling anomalies, resulting in poor reasoning by robot in estimating the crowd spatial distribution. In this article, a Receding Horizon Optimization (RHO) formulation is proposed that incorporates a Probability-related Partially Updated Memory (PPUM) for robot path planning in crowded environments with uncertainties. The PPUM acts as a memory layer that combines real-time sensor observations with historical knowledge using a weighted evidence fusion theory to improve robot's adaptivity to the dynamic environments. RHO then utilizes the PPUM as a informed knowledge to generate a path that minimizes the likelihood of encountering dense crowds while reducing the cost of local motion planning. The proposed approach provides an innovative solution to the problem of robot's long-term safe interaction with human in uncertain crowded environments. In simulation, the results demonstrate the superior performance of our approach compared to benchmark methods in terms of crowd distribution estimation accuracy, adaptability to anomalies and path planning efficiency.
Abstract:Motion forecasting plays a crucial role in autonomous driving, with the aim of predicting the future reasonable motions of traffic agents. Most existing methods mainly model the historical interactions between agents and the environment, and predict multi-modal trajectories in a feedforward process, ignoring potential trajectory changes caused by future interactions between agents. In this paper, we propose a novel Future Feedback Interaction Network (FFINet) to aggregate features the current observations and potential future interactions for trajectory prediction. Firstly, we employ different spatial-temporal encoders to embed the decomposed position vectors and the current position of each scene, providing rich features for the subsequent cross-temporal aggregation. Secondly, the relative interaction and cross-temporal aggregation strategies are sequentially adopted to integrate features in the current fusion module, observation interaction module, future feedback module and global fusion module, in which the future feedback module can enable the understanding of pre-action by feeding the influence of preview information to feedforward prediction. Thirdly, the comprehensive interaction features are further fed into final predictor to generate the joint predicted trajectories of multiple agents. Extensive experimental results show that our FFINet achieves the state-of-the-art performance on Argoverse 1 and Argoverse 2 motion forecasting benchmarks.
Abstract:Recently, finetuning pretrained vision-language models (VLMs) has become one prevailing paradigm to achieve state-of-the-art performance in VQA. However, as VLMs scale, it becomes computationally expensive, storage inefficient, and prone to overfitting to tune full model parameters for a specific task in low-resource settings. Although current parameter-efficient tuning methods dramatically reduce the number of tunable parameters, there still exists a significant performance gap with full finetuning. In this paper, we propose \textbf{MixPHM}, a redundancy-aware parameter-efficient tuning method that outperforms full finetuning in low-resource VQA. Specifically, MixPHM is a lightweight module implemented by multiple PHM-experts in a mixture-of-experts manner. To reduce parameter redundancy, we reparameterize expert weights in a low-rank subspace and share part of the weights inside and across MixPHM. Moreover, based on our quantitative analysis of representation redundancy, we propose \textbf{redundancy regularization}, which facilitates MixPHM to reduce task-irrelevant redundancy while promoting task-relevant correlation. Experiments conducted on VQA v2, GQA, and OK-VQA with different low-resource settings show that our MixPHM outperforms state-of-the-art parameter-efficient methods and is the only one consistently surpassing full finetuning.
Abstract:Reinforcement learning-based (RL-based) energy management strategy (EMS) is considered a promising solution for the energy management of electric vehicles with multiple power sources. It has been shown to outperform conventional methods in energy management problems regarding energy-saving and real-time performance. However, previous studies have not systematically examined the essential elements of RL-based EMS. This paper presents an empirical analysis of RL-based EMS in a Plug-in Hybrid Electric Vehicle (PHEV) and Fuel Cell Electric Vehicle (FCEV). The empirical analysis is developed in four aspects: algorithm, perception and decision granularity, hyperparameters, and reward function. The results show that the Off-policy algorithm effectively develops a more fuel-efficient solution within the complete driving cycle compared with other algorithms. Improving the perception and decision granularity does not produce a more desirable energy-saving solution but better balances battery power and fuel consumption. The equivalent energy optimization objective based on the instantaneous state of charge (SOC) variation is parameter sensitive and can help RL-EMSs to achieve more efficient energy-cost strategies.
Abstract:The problem of robustness in adverse weather conditions is considered a significant challenge for computer vision algorithms in the applicants of autonomous driving. Image rain removal algorithms are a general solution to this problem. They find a deep connection between raindrops/rain-streaks and images by mining the hidden features and restoring information about the rain-free environment based on the powerful representation capabilities of neural networks. However, previous research has focused on architecture innovations and has yet to consider the vulnerability issues that already exist in neural networks. This research gap hints at a potential security threat geared toward the intelligent perception of autonomous driving in the rain. In this paper, we propose a universal rain-removal attack (URA) on the vulnerability of image rain-removal algorithms by generating a non-additive spatial perturbation that significantly reduces the similarity and image quality of scene restoration. Notably, this perturbation is difficult to recognise by humans and is also the same for different target images. Thus, URA could be considered a critical tool for the vulnerability detection of image rain-removal algorithms. It also could be developed as a real-world artificial intelligence attack method. Experimental results show that URA can reduce the scene repair capability by 39.5% and the image generation quality by 26.4%, targeting the state-of-the-art (SOTA) single-image rain-removal algorithms currently available.
Abstract:The high emission and low energy efficiency caused by internal combustion engines (ICE) have become unacceptable under environmental regulations and the energy crisis. As a promising alternative solution, multi-power source electric vehicles (MPS-EVs) introduce different clean energy systems to improve powertrain efficiency. The energy management strategy (EMS) is a critical technology for MPS-EVs to maximize efficiency, fuel economy, and range. Reinforcement learning (RL) has become an effective methodology for the development of EMS. RL has received continuous attention and research, but there is still a lack of systematic analysis of the design elements of RL-based EMS. To this end, this paper presents an in-depth analysis of the current research on RL-based EMS (RL-EMS) and summarizes the design elements of RL-based EMS. This paper first summarizes the previous applications of RL in EMS from five aspects: algorithm, perception scheme, decision scheme, reward function, and innovative training method. The contribution of advanced algorithms to the training effect is shown, the perception and control schemes in the literature are analyzed in detail, different reward function settings are classified, and innovative training methods with their roles are elaborated. Finally, by comparing the development routes of RL and RL-EMS, this paper identifies the gap between advanced RL solutions and existing RL-EMS. Finally, this paper suggests potential development directions for implementing advanced artificial intelligence (AI) solutions in EMS.
Abstract:Benefiting from large-scale Pretrained Vision-Language Models (VL-PMs), the performance of Visual Question Answering (VQA) has started to approach human oracle performance. However, finetuning large-scale VL-PMs with limited data for VQA usually faces overfitting and poor generalization issues, leading to a lack of robustness. In this paper, we aim to improve the robustness of VQA systems (ie, the ability of the systems to defend against input variations and human-adversarial attacks) from the perspective of Information Bottleneck when finetuning VL-PMs for VQA. Generally, internal representations obtained by VL-PMs inevitably contain irrelevant and redundant information for the downstream VQA task, resulting in statistically spurious correlations and insensitivity to input variations. To encourage representations to converge to a minimal sufficient statistic in vision-language learning, we propose the Correlation Information Bottleneck (CIB) principle, which seeks a tradeoff between representation compression and redundancy by minimizing the mutual information (MI) between the inputs and internal representations while maximizing the MI between the outputs and the representations. Meanwhile, CIB measures the internal correlations among visual and linguistic inputs and representations by a symmetrized joint MI estimation. Extensive experiments on five VQA benchmarks of input robustness and two VQA benchmarks of human-adversarial robustness demonstrate the effectiveness and superiority of the proposed CIB in improving the robustness of VQA systems.