Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Jiaping Xiao

Grounded Vision-Language Navigation for UAVs with Open-Vocabulary Goal Understanding

Jun 12, 2025

Yuhang Zhang, Haosheng Yu, Jiaping Xiao, Mir Feroskhan

Abstract:Vision-and-language navigation (VLN) is a long-standing challenge in autonomous robotics, aiming to empower agents with the ability to follow human instructions while navigating complex environments. Two key bottlenecks remain in this field: generalization to out-of-distribution environments and reliance on fixed discrete action spaces. To address these challenges, we propose Vision-Language Fly (VLFly), a framework tailored for Unmanned Aerial Vehicles (UAVs) to execute language-guided flight. Without the requirement for localization or active ranging sensors, VLFly outputs continuous velocity commands purely from egocentric observations captured by an onboard monocular camera. The VLFly integrates three modules: an instruction encoder based on a large language model (LLM) that reformulates high-level language into structured prompts, a goal retriever powered by a vision-language model (VLM) that matches these prompts to goal images via vision-language similarity, and a waypoint planner that generates executable trajectories for real-time UAV control. VLFly is evaluated across diverse simulation environments without additional fine-tuning and consistently outperforms all baselines. Moreover, real-world VLN tasks in indoor and outdoor environments under direct and indirect instructions demonstrate that VLFly achieves robust open-vocabulary goal understanding and generalized navigation capabilities, even in the presence of abstract language input.

Via

Access Paper or Ask Questions

FM-Planner: Foundation Model Guided Path Planning for Autonomous Drone Navigation

May 27, 2025

Jiaping Xiao, Cheng Wen Tsao, Yuhang Zhang, Mir Feroskhan

Abstract:Path planning is a critical component in autonomous drone operations, enabling safe and efficient navigation through complex environments. Recent advances in foundation models, particularly large language models (LLMs) and vision-language models (VLMs), have opened new opportunities for enhanced perception and intelligent decision-making in robotics. However, their practical applicability and effectiveness in global path planning remain relatively unexplored. This paper proposes foundation model-guided path planners (FM-Planner) and presents a comprehensive benchmarking study and practical validation for drone path planning. Specifically, we first systematically evaluate eight representative LLM and VLM approaches using standardized simulation scenarios. To enable effective real-time navigation, we then design an integrated LLM-Vision planner that combines semantic reasoning with visual perception. Furthermore, we deploy and validate the proposed path planner through real-world experiments under multiple configurations. Our findings provide valuable insights into the strengths, limitations, and feasibility of deploying foundation models in real-world drone applications and providing practical implementations in autonomous flight. Project site: https://github.com/NTU-ICG/FM-Planner.

* This work has been submitted for possible publication

Via

Access Paper or Ask Questions

Learning Resilient Formation Control of Drones with Graph Attention Network

Sep 03, 2024

Jiaping Xiao, Xu Fang, Qianlei Jia, Mir Feroskhan

Abstract:The rapid advancement of drone technology has significantly impacted various sectors, including search and rescue, environmental surveillance, and industrial inspection. Multidrone systems offer notable advantages such as enhanced efficiency, scalability, and redundancy over single-drone operations. Despite these benefits, ensuring resilient formation control in dynamic and adversarial environments, such as under communication loss or cyberattacks, remains a significant challenge. Classical approaches to resilient formation control, while effective in certain scenarios, often struggle with complex modeling and the curse of dimensionality, particularly as the number of agents increases. This paper proposes a novel, learning-based formation control for enhancing the adaptability and resilience of multidrone formations using graph attention networks (GATs). By leveraging GAT's dynamic capabilities to extract internode relationships based on the attention mechanism, this GAT-based formation controller significantly improves the robustness of drone formations against various threats, such as Denial of Service (DoS) attacks. Our approach not only improves formation performance in normal conditions but also ensures the resilience of multidrone systems in variable and adversarial environments. Extensive simulation results demonstrate the superior performance of our method over baseline formation controllers. Furthermore, the physical experiments validate the effectiveness of the trained control policy in real-world flights.

* This work has been submitted to the IEEE for possible publication

Via

Access Paper or Ask Questions

Clustering-based Learning for UAV Tracking and Pose Estimation

May 27, 2024

Jiaping Xiao, Phumrapee Pisutsin, Cheng Wen Tsao, Mir Feroskhan

Abstract:UAV tracking and pose estimation plays an imperative role in various UAV-related missions, such as formation control and anti-UAV measures. Accurately detecting and tracking UAVs in a 3D space remains a particularly challenging problem, as it requires extracting sparse features of micro UAVs from different flight environments and continuously matching correspondences, especially during agile flight. Generally, cameras and LiDARs are the two main types of sensors used to capture UAV trajectories in flight. However, both sensors have limitations in UAV classification and pose estimation. This technical report briefly introduces the method proposed by our team "NTU-ICG" for the CVPR 2024 UG2+ Challenge Track 5. This work develops a clustering-based learning detection approach, CL-Det, for UAV tracking and pose estimation using two types of LiDARs, namely Livox Avia and LiDAR 360. We combine the information from the two data sources to locate drones in 3D. We first align the timestamps of Livox Avia data and LiDAR 360 data and then separate the point cloud of objects of interest (OOIs) from the environment. The point cloud of OOIs is clustered using the DBSCAN method, with the midpoint of the largest cluster assumed to be the UAV position. Furthermore, we utilize historical estimations to fill in missing data. The proposed method shows competitive pose estimation performance and ranks 5th on the final leaderboard of the CVPR 2024 UG2+ Challenge.

* Submitted Report of CVPR 2024 UG2+ Challenge Track 5

Via

Access Paper or Ask Questions

Vision-based Learning for Drones: A Survey

Dec 08, 2023

Jiaping Xiao, Rangya Zhang, Yuhang Zhang, Mir Feroskhan

Abstract:Drones as advanced cyber-physical systems are undergoing a transformative shift with the advent of vision-based learning, a field that is rapidly gaining prominence due to its profound impact on drone autonomy and functionality. Different from existing task-specific surveys, this review offers a comprehensive overview of vision-based learning in drones, emphasizing its pivotal role in enhancing their operational capabilities. We start by elucidating the fundamental principles of vision-based learning, highlighting how it significantly improves drones' visual perception and decision-making processes. We then categorize vision-based control methods into indirect, semi-direct, and end-to-end approaches from the perception-control perspective. We further explore various applications of vision-based drones with learning capabilities, ranging from single-agent systems to more complex multi-agent and heterogeneous system scenarios, and underscore the challenges and innovations characterizing each area. Finally, we explore open questions and potential solutions, paving the way for ongoing research and development in this dynamic and rapidly evolving field. With growing large language models (LLMs) and embodied intelligence, vision-based learning for drones provides a promising but challenging road towards artificial general intelligence (AGI) in 3D physical world.

Via

Access Paper or Ask Questions

Target Search and Navigation in Heterogeneous Robot Systems with Deep Reinforcement Learning

Aug 01, 2023

Yun Chen, Jiaping Xiao

Abstract:Collaborative heterogeneous robot systems can greatly improve the efficiency of target search and navigation tasks. In this paper, we design a heterogeneous robot system consisting of a UAV and a UGV for search and rescue missions in unknown environments. The system is able to search for targets and navigate to them in a maze-like mine environment with the policies learned through deep reinforcement learning algorithms. During the training process, if two robots are trained simultaneously, the rewards related to their collaboration may not be properly obtained. Hence, we introduce a multi-stage reinforcement learning framework and a curiosity module to encourage agents to explore unvisited environments. Experiments in simulation environments show that our framework can train the heterogeneous robot system to achieve the search and navigation with unknown target locations while existing baselines may not, and accelerate the training speed.

Via

Access Paper or Ask Questions

EEG-based Sleep Staging with Hybrid Attention

May 16, 2023

Xinliang Zhou, Chenyu Liu, Jiaping Xiao, Yang Liu

Figure 1 for EEG-based Sleep Staging with Hybrid Attention

Figure 2 for EEG-based Sleep Staging with Hybrid Attention

Abstract:Sleep staging is critical for assessing sleep quality and diagnosing sleep disorders. However, capturing both the spatial and temporal relationships within electroencephalogram (EEG) signals during different sleep stages remains challenging. In this paper, we propose a novel framework called the Hybrid Attention EEG Sleep Staging (HASS) Framework. Specifically, we propose a well-designed spatio-temporal attention mechanism to adaptively assign weights to inter-channels and intra-channel EEG segments based on the spatio-temporal relationship of the brain during different sleep stages. Experiment results on the MASS and ISRUC datasets demonstrate that HASS can significantly improve typical sleep staging networks. Our proposed framework alleviates the difficulties of capturing the spatial-temporal relationship of EEG signals during sleep staging and holds promise for improving the accuracy and reliability of sleep assessment in both clinical and research settings.

Via

Access Paper or Ask Questions

An EEG Channel Selection Framework for Driver Drowsiness Detection via Interpretability Guidance

Apr 26, 2023

Xinliang Zhou, Dan Lin, Ziyu Jia, Jiaping Xiao, Chenyu Liu, Liming Zhai, Yang Liu

Abstract:Drowsy driving has a crucial influence on driving safety, creating an urgent demand for driver drowsiness detection. Electroencephalogram (EEG) signal can accurately reflect the mental fatigue state and thus has been widely studied in drowsiness monitoring. However, the raw EEG data is inherently noisy and redundant, which is neglected by existing works that just use single-channel EEG data or full-head channel EEG data for model training, resulting in limited performance of driver drowsiness detection. In this paper, we are the first to propose an Interpretability-guided Channel Selection (ICS) framework for the driver drowsiness detection task. Specifically, we design a two-stage training strategy to progressively select the key contributing channels with the guidance of interpretability. We first train a teacher network in the first stage using full-head channel EEG data. Then we apply the class activation mapping (CAM) to the trained teacher model to highlight the high-contributing EEG channels and further propose a channel voting scheme to select the top N contributing EEG channels. Finally, we train a student network with the selected channels of EEG data in the second stage for driver drowsiness detection. Experiments are designed on a public dataset, and the results demonstrate that our method is highly applicable and can significantly improve the performance of cross-subject driver drowsiness detection.

Via

Access Paper or Ask Questions

AMS-DRL: Learning Multi-Pursuit Evasion for Safe Targeted Navigation of Drones

Apr 07, 2023

Jiaping Xiao, Mir Feroskhan

Abstract:Safe navigation of drones in the presence of adversarial physical attacks from multiple pursuers is a challenging task. This paper proposes a novel approach, asynchronous multi-stage deep reinforcement learning (AMS-DRL), to train an adversarial neural network that can learn from the actions of multiple pursuers and adapt quickly to their behavior, enabling the drone to avoid attacks and reach its target. Our approach guarantees convergence by ensuring Nash Equilibrium among agents from the game-theory analysis. We evaluate our method in extensive simulations and show that it outperforms baselines with higher navigation success rates. We also analyze how parameters such as the relative maximum speed affect navigation performance. Furthermore, we have conducted physical experiments and validated the effectiveness of the trained policies in real-time flights. A success rate heatmap is introduced to elucidate how spatial geometry influences navigation outcomes. Project website: https://github.com/NTU-UAVG/AMS-DRL-for-Pursuit-Evasion.

Via

Access Paper or Ask Questions

Collaborative Target Search with a Visual Drone Swarm: An Adaptive Curriculum Embedded Multi-stage Reinforcement Learning Approach

Apr 27, 2022

Jiaping Xiao, Phumrapee Pisutsin, Mir Feroskhan

Figure 1 for Collaborative Target Search with a Visual Drone Swarm: An Adaptive Curriculum Embedded Multi-stage Reinforcement Learning Approach

Figure 2 for Collaborative Target Search with a Visual Drone Swarm: An Adaptive Curriculum Embedded Multi-stage Reinforcement Learning Approach

Figure 3 for Collaborative Target Search with a Visual Drone Swarm: An Adaptive Curriculum Embedded Multi-stage Reinforcement Learning Approach

Figure 4 for Collaborative Target Search with a Visual Drone Swarm: An Adaptive Curriculum Embedded Multi-stage Reinforcement Learning Approach

Abstract:Equipping drones with target search capabilities is desirable for applications in disaster management scenarios and smart warehouse delivery systems. Instead of deploying a single drone, an intelligent drone swarm that can collaborate with one another in maneuvering among obstacles will be more effective in accomplishing the target search in a shorter amount of time. In this work, we propose a data-efficient reinforcement learning-based approach, Adaptive Curriculum Embedded Multi-Stage Learning (ACEMSL), to address the challenges of carrying out a collaborative target search with a visual drone swarm, namely the 3D sparse reward space exploration and the collaborative behavior requirement. Specifically, we develop an adaptive embedded curriculum, where the task difficulty level can be adaptively adjusted according to the success rate achieved in training. Meanwhile, with multi-stage learning, ACEMSL allows data-efficient training and individual-team reward allocation for the collaborative drone swarm. The effectiveness and generalization capability of our approach are validated using simulations and actual flight tests.

* Submitted to IEEE Transactions on Neural Networks and Learning Systems

Via

Access Paper or Ask Questions