Abstract:In this paper, we propose an adaptive event-triggered reinforcement learning control for continuous-time nonlinear systems, subject to bounded uncertainties, characterized by complex interactions. Specifically, the proposed method is capable of jointly learning both the control policy and the communication policy, thereby reducing the number of parameters and computational overhead when learning them separately or only one of them. By augmenting the state space with accrued rewards that represent the performance over the entire trajectory, we show that accurate and efficient determination of triggering conditions is possible without the need for explicit learning triggering conditions, thereby leading to an adaptive non-stationary policy. Finally, we provide several numerical examples to demonstrate the effectiveness of the proposed approach.
Abstract:In this paper, we consider the tracking of arbitrary curvilinear geometric paths in three-dimensional output spaces of unmanned aerial vehicles (UAVs) without pre-specified timing requirements, commonly referred to as path-following problems, subjected to bounded inputs. Specifically, we propose a novel nonlinear path-following guidance law for a UAV that enables it to follow any smooth curvilinear path in three dimensions while accounting for the bounded control authority in the design. The proposed solution offers a general treatment of the path-following problem by removing the dependency on the path's geometry, which makes it applicable to paths with varying levels of complexity and smooth curvatures. Additionally, the proposed strategy draws inspiration from the pursuit guidance approach, which is known for its simplicity and ease of implementation. Theoretical analysis guarantees that the UAV converges to its desired path within a fixed time and remains on it irrespective of its initial configuration with respect to the path. Finally, the simulations demonstrate the merits and effectiveness of the proposed guidance strategy through a wide range of engagement scenarios, showcasing the UAV's ability to follow diverse curvilinear paths accurately.
Abstract:This article presents a three-dimensional nonlinear trajectory tracking control strategy for unmanned aerial vehicles (UAVs) in the presence of spatial constraints. As opposed to many existing control strategies, which do not consider spatial constraints, the proposed strategy considers spatial constraints on each degree of freedom movement of the UAV. Such consideration makes the design appealing for many practical applications, such as pipeline inspection, boundary tracking, etc. The proposed design accounts for the limited information about the inertia matrix, thereby affirming its inherent robustness against unmodeled dynamics and other imperfections. We rigorously show that the UAV will converge to its desired path by maintaining bounded position, orientation, and linear and angular speeds. Finally, we demonstrate the effectiveness of the proposed strategy through various numerical simulations.
Abstract:In this paper, we address the problem of enclosing an arbitrarily moving target in three dimensions by a single pursuer, which is an unmanned aerial vehicle (UAV), for maximum coverage while also ensuring the pursuer's safety by preventing collisions with the target. The proposed guidance strategy steers the pursuer to a safe region of space surrounding the target, allowing it to maintain a certain distance from the latter while offering greater flexibility in positioning and converging to any orbit within this safe zone. Our approach is distinguished by the use of nonholonomic constraints to model vehicles with accelerations serving as control inputs and coupled engagement kinematics to craft the pursuer's guidance law meticulously. Furthermore, we leverage the concept of the Lyapunov Barrier Function as a powerful tool to constrain the distance between the pursuer and the target within asymmetric bounds, thereby ensuring the pursuer's safety within the predefined region. To validate the efficacy and robustness of our algorithm, we conduct experimental tests by implementing a high-fidelity quadrotor model within Software-in-the-loop (SITL) simulations, encompassing various challenging target maneuver scenarios. The results obtained showcase the resilience of the proposed guidance law, effectively handling arbitrarily maneuvering targets, vehicle/autopilot dynamics, and external disturbances. Our method consistently delivers stable global enclosing behaviors, even in response to aggressive target maneuvers, and requires only relative information for successful execution.
Abstract:This paper introduces an approach to address the target enclosing problem using non-holonomic multiagent systems, where agents autonomously self-organize themselves in the desired formation around a fixed target. Our approach combines global enclosing behavior and local collision avoidance mechanisms by devising a novel potential function and sliding manifold. In our approach, agents independently move toward the desired enclosing geometry when apart and activate the collision avoidance mechanism when a collision is imminent, thereby guaranteeing inter-agent safety. We rigorously show that an agent does not need to ensure safety with every other agent and put forth a concept of the nearest colliding agent (for any arbitrary agent) with whom ensuring safety is sufficient to avoid collisions in the entire swarm. The proposed control eliminates the need for a fixed or pre-established agent arrangement around the target and requires only relative information between an agent and the target. This makes our design particularly appealing for scenarios with limited global information, hence significantly reducing communication requirements. We finally present simulation results to vindicate the efficacy of the proposed method.
Abstract:This paper addresses the pursuit-evasion problem involving three agents -- a purser, an evader, and a defender. We develop cooperative guidance laws for the evader-defender team that guarantee that the defender intercepts the pursuer before it reaches the vicinity of the evader. Unlike heuristic methods, optimal control, differential game formulation, and recently proposed time-constrained guidance techniques, we propose a geometric solution to safeguard the evader from the pursuer's incoming threat. The proposed strategy is computationally efficient and expected to be scalable as the number of agents increases. Another alluring feature of the proposed strategy is that the evader-defender team does not require the knowledge of the pursuer's strategy and that the pursuer's interception is guaranteed from arbitrary initial engagement geometries. We further show that the necessary error variables for the evader-defender team vanish within a time that can be exactly prescribed prior to the three-body engagement. Finally, we demonstrate the efficacy of the proposed cooperative defense strategy via simulation in diverse engagement scenarios.
Abstract:In this paper, we address the issue of fairness in preference-based reinforcement learning (PbRL) in the presence of multiple objectives. The main objective is to design control policies that can optimize multiple objectives while treating each objective fairly. Toward this objective, we design a new fairness-induced preference-based reinforcement learning or FPbRL. The main idea of FPbRL is to learn vector reward functions associated with multiple objectives via new welfare-based preferences rather than reward-based preference in PbRL, coupled with policy learning via maximizing a generalized Gini welfare function. Finally, we provide experiment studies on three different environments to show that the proposed FPbRL approach can achieve both efficiency and equity for learning effective and fair policies.
Abstract:This paper considers a pursuit-evasion scenario among three agents -- an evader, a pursuer, and a defender. We design cooperative guidance laws for the evader and the defender team to safeguard the evader from an attacking pursuer. Unlike differential games, optimal control formulations, and other heuristic methods, we propose a novel perspective on designing effective nonlinear feedback control laws for the evader-defender team using a time-constrained guidance approach. The evader lures the pursuer on the collision course by offering itself as bait. At the same time, the defender protects the evader from the pursuer by exercising control over the engagement duration. Depending on the nature of the mission, the defender may choose to take an aggressive or defensive stance. Such consideration widens the applicability of the proposed methods in various three-agent motion planning scenarios such as aircraft defense, asset guarding, search and rescue, surveillance, and secure transportation. We use a fixed-time sliding mode control strategy to design the control laws for the evader-defender team and a nonlinear finite-time disturbance observer to estimate the pursuer's maneuver. Finally, we present simulations to demonstrate favorable performance under various engagement geometries, thus vindicating the efficacy of the proposed designs.
Abstract:This work targets the problem of odor source localization by multi-agent systems. A hierarchical cooperative control has been put forward to solve the problem of locating source of an odor by driving the agents in consensus when at least one agent obtains information about location of the source. Synthesis of the proposed controller has been carried out in a hierarchical manner of group decision making, path planning and control. Decision making utilizes information of the agents using conventional Particle Swarm Algorithm and information of the movement of filaments to predict the location of the odor source. The predicted source location in the decision level is then utilized to map a trajectory and pass that information to the control level. The distributed control layer uses sliding mode controllers known for their inherent robustness and the ability to reject matched disturbances completely. Two cases of movement of agents towards the source, i.e., under consensus and formation have been discussed herein. Finally, numerical simulations demonstrate the efficacy of the proposed hierarchical distributed control.
Abstract:We present a new optimization-theoretic approach to analyzing Follow-the-Leader style algorithms, particularly in the setting where perturbations are used as a tool for regularization. We show that adding a strongly convex penalty function to the decision rule and adding stochastic perturbations to data correspond to deterministic and stochastic smoothing operations, respectively. We establish an equivalence between "Follow the Regularized Leader" and "Follow the Perturbed Leader" up to the smoothness properties. This intuition leads to a new generic analysis framework that recovers and improves the previous known regret bounds of the class of algorithms commonly known as Follow the Perturbed Leader.