Abstract:Anomaly detection (AD) is an important machine learning task with many real-world uses, including fraud detection, medical diagnosis, and industrial monitoring. Within natural language processing (NLP), AD helps detect issues like spam, misinformation, and unusual user activity. Although large language models (LLMs) have had a strong impact on tasks such as text generation and summarization, their potential in AD has not been studied enough. This paper introduces AD-LLM, the first benchmark that evaluates how LLMs can help with NLP anomaly detection. We examine three key tasks: (i) zero-shot detection, using LLMs' pre-trained knowledge to perform AD without tasks-specific training; (ii) data augmentation, generating synthetic data and category descriptions to improve AD models; and (iii) model selection, using LLMs to suggest unsupervised AD models. Through experiments with different datasets, we find that LLMs can work well in zero-shot AD, that carefully designed augmentation methods are useful, and that explaining model selection for specific datasets remains challenging. Based on these results, we outline six future research directions on LLMs for AD.
Abstract:Anomaly detection (AD) is a critical machine learning task with diverse applications in web systems, including fraud detection, content moderation, and user behavior analysis. Despite its significance, AD in natural language processing (NLP) remains underexplored, limiting advancements in detecting anomalies in text data such as harmful content, phishing attempts, or spam reviews. In this paper, we introduce NLP-ADBench, the most comprehensive benchmark for NLP anomaly detection (NLP-AD), comprising eight curated datasets and evaluations of nineteen state-of-the-art algorithms. These include three end-to-end methods and sixteen two-step algorithms that apply traditional anomaly detection techniques to language embeddings generated by bert-base-uncased and OpenAI's text-embedding-3-large models. Our results reveal critical insights and future directions for NLP-AD. Notably, no single model excels across all datasets, highlighting the need for automated model selection. Moreover, two-step methods leveraging transformer-based embeddings consistently outperform specialized end-to-end approaches, with OpenAI embeddings demonstrating superior performance over BERT embeddings. By releasing NLP-ADBench at https://github.com/USC-FORTIS/NLP-ADBench, we provide a standardized framework for evaluating NLP-AD methods, fostering the development of innovative approaches. This work fills a crucial gap in the field and establishes a foundation for advancing NLP anomaly detection, particularly in the context of improving the safety and reliability of web-based systems.
Abstract:Out-of-distribution (OOD) detection is essential for ensuring the robustness of machine learning models by identifying samples that deviate from the training distribution. While traditional OOD detection has primarily focused on single-modality inputs, such as images, recent advances in multimodal models have demonstrated the potential of leveraging multiple modalities (e.g., video, optical flow, audio) to enhance detection performance. However, existing methods often overlook intra-class variability within in-distribution (ID) data, assuming that samples of the same class are perfectly cohesive and consistent. This assumption can lead to performance degradation, especially when prediction discrepancies are uniformly amplified across all samples. To address this issue, we propose Dynamic Prototype Updating (DPU), a novel plug-and-play framework for multimodal OOD detection that accounts for intra-class variations. Our method dynamically updates class center representations for each class by measuring the variance of similar samples within each batch, enabling adaptive adjustments. This approach allows us to amplify prediction discrepancies based on the updated class centers, thereby improving the model's robustness and generalization across different modalities. Extensive experiments on two tasks, five datasets, and nine base OOD algorithms demonstrate that DPU significantly improves OOD detection performance, setting a new state-of-the-art in multimodal OOD detection, with improvements of up to 80 percent in Far-OOD detection. To facilitate accessibility and reproducibility, our code is publicly available on GitHub.
Abstract:This paper presents a novel trajectory planning method for aerial perching. Compared with the existing work, the terminal states and the trajectory durations can be adjusted adaptively, instead of being determined in advance. Furthermore, our planner is able to minimize the tangential relative speed on the premise of safety and dynamic feasibility. This feature is especially notable on micro aerial robots with low maneuverability or scenarios where the space is not enough. Moreover, we design a flexible transformation strategy to eliminate terminal constraints along with reducing optimization variables. Besides, we take precise SE(3) motion planning into account to ensure that the drone would not touch the landing platform until the last moment. The proposed method is validated onboard by a palm-sized micro aerial robot with quite limited thrust and moment (thrust-to-weight ratio 1.7) perching on a mobile inclined surface. Sufficient experimental results show that our planner generates an optimal trajectory within 20ms, and replans with warm start in 2ms.
Abstract:This letter presents a complete framework Meeting-Merging-Mission for multi-robot exploration under communication restriction. Considering communication is limited in both bandwidth and range in the real world, we propose a lightweight environment presentation method and an efficient cooperative exploration strategy. For lower bandwidth, each robot utilizes specific polytopes to maintains free space and super frontier information (SFI) as the source for exploration decision-making. To reduce repeated exploration, we develop a mission-based protocol that drives robots to share collected information in stable rendezvous. We also design a complete path planning scheme for both centralized and decentralized cases. To validate that our framework is practical and generic, we present an extensive benchmark and deploy our system into multi-UGV and multi-UAV platforms.
Abstract:Recently, the community has witnessed numerous datasets built for developing and testing state estimators. However, for some applications such as aerial transportation or search-and-rescue, the contact force or other disturbance must be perceived for robust planning robust control, which is beyond the capacity of these datasets. This paper introduces a Visual-Inertial-Dynamical(VID) dataset, not only focusing on traditional six degrees of freedom (6DOF) pose estimation but also providing dynamical characteristics of the flight platform for external force perception or dynamics-aided estimation. The VID dataset contains hard synchronized imagery and inertial measurements, with accurate ground truth trajectories for evaluating common visual-inertial estimators. Moreover, the proposed dataset highlights the measurements of rotor speed and motor current, dynamical inputs, and ground truth 6-axis force data to evaluate external force estimation. To the best of our knowledge, the proposed VID dataset is the first public dataset containing visual-inertial and complete dynamical information for pose and external force evaluation. The dataset and related open source files are available at \url{https://github.com/ZJU-FAST-Lab/VID-Dataset}.
Abstract:In recent years, several progressive works promote the development of aerial tracking. One of the representative works is our previous work Fast-tracker which is applicable to various challenging tracking scenarios. However, it suffers from two main drawbacks: 1) the over simplification in target detection by using artificial markers and 2) the contradiction between simultaneous target and environment perception with limited onboard vision. In this paper, we upgrade the target detection in Fast-tracker to detect and localize a human target based on deep learning and non-linear regression to solve the former problem. For the latter one, we equip the quadrotor system with 360 degree active vision on a customized gimbal camera. Furthermore, we improve the tracking trajectory planning in Fast-tracker by incorporating an occlusion-aware mechanism that generates observable tracking trajectories. Comprehensive real-world tests confirm the proposed system's robustness and real-time capability. Benchmark comparisons with Fast-tracker validate that the proposed system presents better tracking performance even when performing more difficult tracking tasks.
Abstract:Recently, quadrotors are gaining significant attention in aerial transportation and delivery. In these scenarios, an accurate estimation of the external force is as essential as the 6 degree-of-freedom (DoF) pose since it is of vital importance for planning and control of the vehicle. To this end, we propose a tightly-coupled Visual-Inertial-Dynamics (VID) system that simultaneously estimates the external force applied to the quadrotor along with the 6 DoF pose. Our method builds on the state-of-the-art optimization-based Visual-Inertial system, with a novel deduction of the dynamics and external force factor extended from VIMO. Utilizing the proposed dynamics and external force factor, our estimator robustly and accurately estimates the external force even when it varies widely. Moreover, since we explicitly consider the influence of the external force, when compared with VIMO and VINS-Mono, our method shows comparable and superior pose accuracy, even when the external force ranges from neglectable to significant. The robustness and effectiveness of the proposed method are validated by extensive real-world experiments and application scenario simulation. We will release an open-source package of this method along with datasets with ground truth force measurements for the reference of the community.