Abstract:Multi-agent reinforcement learning is a challenging and active field of research due to the inherent nonstationary property and coupling between agents. A popular approach to modeling the multi-agent interactions underlying the multi-agent RL problem is the Markov Game. There is a special type of Markov Game, termed Markov Potential Game, which allows us to reduce the Markov Game to a single-objective optimal control problem where the objective function is a potential function. In this work, we prove that a multi-agent collaborative field coverage problem, which is found in many engineering applications, can be formulated as a Markov Potential Game, and we can learn a parameterized closed-loop Nash Equilibrium by solving an equivalent single-objective optimal control problem. As a result, our algorithm is 10x faster during training compared to a game-theoretic baseline and converges faster during policy execution.
Abstract:C-ADMM is a well-known distributed optimization framework due to its guaranteed convergence in convex optimization problems. Recently, C-ADMM has been studied in robotics applications such as multi-vehicle target tracking and collaborative manipulation tasks. However, few works have investigated the performance of C-ADMM applied to non-convex problems in robotics applications due to a lack of theoretical guarantees. For this project, we aim to quantitatively explore and examine the convergence behavior of non-convex C-ADMM through the scope of distributed multi-robot trajectory planning. We propose a convex trajectory planning problem by leveraging C-ADMM and Buffered Voronoi Cells (BVCs) to get around the non-convex collision avoidance constraint and compare this convex C-ADMM algorithm to a non-convex C-ADMM baseline with non-convex collision avoidance constraints. We show that the convex C-ADMM algorithm requires 1000 fewer iterations to achieve convergence in a multi-robot waypoint navigation scenario. We also confirm that the non-convex C-ADMM baseline leads to sub-optimal solutions and violation of safety constraints in trajectory generation.
Abstract:In this work, we develop a scalable, local trajectory optimization algorithm that enables robots to interact with other robots. It has been shown that agents' interactions can be successfully captured in game-theoretic formulations, where the interaction outcome can be best modeled via the equilibria of the underlying dynamic game. However, it is typically challenging to compute equilibria of dynamic games as it involves simultaneously solving a set of coupled optimal control problems. Existing solvers operate in a centralized fashion and do not scale up tractably to multiple interacting agents. We enable scalable distributed game-theoretic planning by leveraging the structure inherent in multi-agent interactions, namely, interactions belonging to the class of dynamic potential games. Since equilibria of dynamic potential games can be found by minimizing a single potential function, we can apply distributed and decentralized control techniques to seek equilibria of multi-agent interactions in a scalable and distributed manner. We compare the performance of our algorithm with a centralized interactive planner in a number of simulation studies and demonstrate that our algorithm results in better efficiency and scalability. We further evaluate our method in hardware experiments involving multiple quadcopters.