Abstract:Recent text-to-image diffusion models leverage cross-attention layers, which have been effectively utilized to enhance a range of visual generative tasks. However, our understanding of cross-attention layers remains somewhat limited. In this study, we present a method for constructing Head Relevance Vectors (HRVs) that align with useful visual concepts. An HRV for a given visual concept is a vector with a length equal to the total number of cross-attention heads, where each element represents the importance of the corresponding head for the given visual concept. We develop and employ an ordered weakening analysis to demonstrate the effectiveness of HRVs as interpretable features. To demonstrate the utility of HRVs, we propose concept strengthening and concept adjusting methods and apply them to enhance three visual generative tasks. We show that misinterpretations of polysemous words in image generation can be corrected in most cases, five challenging attributes in image editing can be successfully modified, and catastrophic neglect in multi-concept generation can be mitigated. Overall, our work provides an advancement in understanding cross-attention layers and introduces new approaches for fine-controlling these layers at the head level.
Abstract:This letter presents a distributed trajectory planning method for multi-agent aerial tracking. The proposed method uses a Dynamic Buffered Voronoi Cell (DBVC) and a Dynamic Inter-Visibility Cell (DIVC) to formulate the distributed trajectory generation. Specifically, the DBVC and the DIVC are time-variant spaces that prevent mutual collisions and occlusions among agents, while enabling them to maintain suitable distances from the moving target. We combine the DBVC and the DIVC with an efficient Bernstein polynomial motion primitive-based tracking generation method, which has been refined into a less conservative approach than in our previous work. The proposed algorithm can compute each agent's trajectory within several milliseconds on an Intel i7 desktop. We validate the tracking performance in challenging scenarios, including environments with dozens of obstacles.
Abstract:This letter presents a versatile trajectory planning pipeline for aerial tracking. The proposed tracker is capable of handling various chasing settings such as complex unstructured environments, crowded dynamic obstacles and multiple-target following. Among the entire pipeline, we focus on developing a predictor for future target motion and a chasing trajectory planner. For rapid computation, we employ the sample-check-select strategy: modules sample a set of candidate movements, check multiple constraints, and then select the best trajectory. Also, we leverage the properties of Bernstein polynomials for quick calculations. The prediction module predicts the trajectories of the targets, which do not overlap with static and dynamic obstacles. Then the trajectory planner outputs a trajectory, ensuring various conditions such as occlusion and collision avoidance, the visibility of all targets within a camera image and dynamical limits. We fully test the proposed tracker in simulations and hardware experiments under challenging scenarios, including dual-target following, environments with dozens of dynamic obstacles and complex indoor and outdoor spaces.
Abstract:In text-to-image personalization, a timely and crucial challenge is the tendency of generated images overfitting to the biases present in the reference images. We initiate our study with a comprehensive categorization of the biases into background, nearby-object, tied-object, substance (in style re-contextualization), and pose biases. These biases manifest in the generated images due to their entanglement into the subject embedding. This undesired embedding entanglement not only results in the reflection of biases from the reference images into the generated images but also notably diminishes the alignment of the generated images with the given generation prompt. To address this challenge, we propose SID~(Selectively Informative Description), a text description strategy that deviates from the prevalent approach of only characterizing the subject's class identification. SID is generated utilizing multimodal GPT-4 and can be seamlessly integrated into optimization-based models. We present comprehensive experimental results along with analyses of cross-attention maps, subject-alignment, non-subject-disentanglement, and text-alignment.
Abstract:Maintaining the visibility of the targets is one of the major objectives of aerial tracking applications. This paper proposes QP Chaser, a trajectory planning pipeline that can enhance the visibility of single- and dual-target in both static and dynamic environments. As the name suggests, the proposed planner generates a target-visible trajectory via quadratic programming problems. First, the predictor forecasts the reachable sets of moving objects with a sample-and-check strategy considering obstacles. Subsequently, the trajectory planner reinforces the visibility of targets with consideration of 1) path topology and 2) reachable sets of targets and obstacles. We define a target-visible region (TVR) with topology analysis of not only static obstacles but also dynamic obstacles, and it reflects reachable sets of moving targets and obstacles to maintain the whole body of the target within the camera image robustly and ceaselessly. The online performance of the proposed planner is validated in multiple scenarios, including high-fidelity simulations and real-world experiments.
Abstract:This paper presents a decentralized multi-agent trajectory planning (MATP) algorithm that guarantees to generate a safe, deadlock-free trajectory in an obstacle-rich environment under a limited communication range. The proposed algorithm utilizes a grid-based multi-agent path planning (MAPP) algorithm for deadlock resolution, and we introduce the subgoal optimization method to make the agent converge to the waypoint generated from the MAPP without deadlock. In addition, the proposed algorithm ensures the feasibility of the optimization problem and collision avoidance by adopting a linear safe corridor (LSC). We verify that the proposed algorithm does not cause a deadlock in both random forests and dense mazes regardless of communication range, and it outperforms our previous work in flight time and distance. We validate the proposed algorithm through a hardware demonstration with ten quadrotors.
Abstract:This paper presents a new online multi-agent trajectory planning algorithm that guarantees to generate safe, dynamically feasible trajectories in a cluttered environment. The proposed algorithm utilizes a linear safe corridor (LSC) to formulate the distributed trajectory optimization problem with only feasible constraints, so it does not resort to slack variables or soft constraints to avoid optimization failure. Also, we adopt a priority-based goal planning method to prevent the deadlock without additional communication for decision making. The proposed algorithm can compute the trajectories for 60 agents on average 15.5 ms per agent with an Intel i7 laptop and can find the trajectory that reaches the goal without deadlock in both random forest and indoor space. We validated safety and operability of the proposed algorithm through a real flight test with ten quadrotors in a maze-like environment.
Abstract:This paper presents a new trajectory planning method for multiple quadrotors in obstacle-dense environments. We suggest a relative safe flight corridor (RSFC) to model safe region between a pair of agents, and it is used to generate linear constraints for inter-collision avoidance by utilizing the convex hull property of relative Bernstein polynomial. Our approach employs a graph-based multi-agent pathfinding algorithm to generate an initial trajectory, which is used to construct a safe flight corridor (SFC) and RSFC. We express the trajectory as a piecewise Bernstein polynomial and formulate the trajectory planning problem into one quadratic programming problem using linear constraints from SFC and RSFC. The proposed method can compute collision-free trajectory for 16 agents within a second and for 64 agents less than a minute, and it is validated both through simulation and indoor flight test.