Abstract:Large language models (LLMs) face the challenge of hallucinations -- outputs that seem coherent but are actually incorrect. A particularly damaging type is fact-conflicting hallucination (FCH), where generated content contradicts established facts. Addressing FCH presents three main challenges: 1) Automatically constructing and maintaining large-scale benchmark datasets is difficult and resource-intensive; 2) Generating complex and efficient test cases that the LLM has not been trained on -- especially those involving intricate temporal features -- is challenging, yet crucial for eliciting hallucinations; and 3) Validating the reasoning behind LLM outputs is inherently difficult, particularly with complex logical relationships, as it requires transparency in the model's decision-making process. This paper presents Drowzee, an innovative end-to-end metamorphic testing framework that utilizes temporal logic to identify fact-conflicting hallucinations (FCH) in large language models (LLMs). Drowzee builds a comprehensive factual knowledge base by crawling sources like Wikipedia and uses automated temporal-logic reasoning to convert this knowledge into a large, extensible set of test cases with ground truth answers. LLMs are tested using these cases through template-based prompts, which require them to generate both answers and reasoning steps. To validate the reasoning, we propose two semantic-aware oracles that compare the semantic structure of LLM outputs to the ground truths. Across nine LLMs in nine different knowledge domains, experimental results show that Drowzee effectively identifies rates of non-temporal-related hallucinations ranging from 24.7% to 59.8%, and rates of temporal-related hallucinations ranging from 16.7% to 39.2%.
Abstract:With the increasing prevalence of autonomous vehicles (AVs), their vulnerability to various types of attacks has grown, presenting significant security challenges. In this paper, we propose a reinforcement learning (RL)-based approach for designing optimal stealthy integrity attacks on AV actuators. We also analyze the limitations of state-of-the-art RL-based secure controllers developed to counter such attacks. Through extensive simulation experiments, we demonstrate the effectiveness and efficiency of our proposed method.
Abstract:Adaptive tracking control for rigid body dynamics is of critical importance in control and robotics, particularly for addressing uncertainties or variations in system model parameters. However, most existing adaptive control methods are designed for systems with states in vector spaces, often neglecting the manifold constraints inherent to robotic systems. In this work, we propose a novel Lie-algebra-based adaptive control method that leverages the intrinsic relationship between the special Euclidean group and its associated Lie algebra. By transforming the state space from the group manifold to a vector space, we derive a linear error dynamics model that decouples model parameters from the system state. This formulation enables the development of an adaptive optimal control method that is both geometrically consistent and computationally efficient. Extensive simulations demonstrate the effectiveness and efficiency of the proposed method. We have made our source code publicly available to the community to support further research and collaboration.
Abstract:Safety-critical cyber-physical systems (CPS), such as quadrotor UAVs, are particularly prone to cyber attacks, which can result in significant consequences if not detected promptly and accurately. During outdoor operations, the nonlinear dynamics of UAV systems, combined with non-Gaussian noise, pose challenges to the effectiveness of conventional statistical and machine learning methods. To overcome these limitations, we present QUADFormer, an advanced attack detection framework for quadrotor UAVs leveraging a transformer-based architecture. This framework features a residue generator that produces sequences sensitive to anomalies, which are then analyzed by the transformer to capture statistical patterns for detection and classification. Furthermore, an alert mechanism ensures UAVs can operate safely even when under attack. Extensive simulations and experimental evaluations highlight that QUADFormer outperforms existing state-of-the-art techniques in detection accuracy.
Abstract:The rapid development and deployment of large language models (LLMs) have introduced a new frontier in artificial intelligence, marked by unprecedented capabilities in natural language understanding and generation. However, the increasing integration of these models into critical applications raises substantial safety concerns, necessitating a thorough examination of their potential risks and associated mitigation strategies. This survey provides a comprehensive overview of the current landscape of LLM safety, covering four major categories: value misalignment, robustness to adversarial attacks, misuse, and autonomous AI risks. In addition to the comprehensive review of the mitigation methodologies and evaluation resources on these four aspects, we further explore four topics related to LLM safety: the safety implications of LLM agents, the role of interpretability in enhancing LLM safety, the technology roadmaps proposed and abided by a list of AI companies and institutes for LLM safety, and AI governance aimed at LLM safety with discussions on international cooperation, policy proposals, and prospective regulatory directions. Our findings underscore the necessity for a proactive, multifaceted approach to LLM safety, emphasizing the integration of technical solutions, ethical considerations, and robust governance frameworks. This survey is intended to serve as a foundational resource for academy researchers, industry practitioners, and policymakers, offering insights into the challenges and opportunities associated with the safe integration of LLMs into society. Ultimately, it seeks to contribute to the safe and beneficial development of LLMs, aligning with the overarching goal of harnessing AI for societal advancement and well-being. A curated list of related papers has been publicly available at https://github.com/tjunlp-lab/Awesome-LLM-Safety-Papers.
Abstract:Large Language Models (LLMs) have transformed numerous fields by enabling advanced natural language interactions but remain susceptible to critical vulnerabilities, particularly jailbreak attacks. Current jailbreak techniques, while effective, often depend on input modifications, making them detectable and limiting their stealth and scalability. This paper presents Targeted Model Editing (TME), a novel white-box approach that bypasses safety filters by minimally altering internal model structures while preserving the model's intended functionalities. TME identifies and removes safety-critical transformations (SCTs) embedded in model matrices, enabling malicious queries to bypass restrictions without input modifications. By analyzing distinct activation patterns between safe and unsafe queries, TME isolates and approximates SCTs through an optimization process. Implemented in the D-LLM framework, our method achieves an average Attack Success Rate (ASR) of 84.86% on four mainstream open-source LLMs, maintaining high performance. Unlike existing methods, D-LLM eliminates the need for specific triggers or harmful response collections, offering a stealthier and more effective jailbreak strategy. This work reveals a covert and robust threat vector in LLM security and emphasizes the need for stronger safeguards in model safety alignment.
Abstract:Machine learning has become a crucial tool for predicting the properties of crystalline materials. However, existing methods primarily represent material information by constructing multi-edge graphs of crystal structures, often overlooking the chemical and physical properties of elements (such as atomic radius, electronegativity, melting point, and ionization energy), which have a significant impact on material performance. To address this limitation, we first constructed an element property knowledge graph and utilized an embedding model to encode the element attributes within the knowledge graph. Furthermore, we propose a multimodal fusion framework, ESNet, which integrates element property features with crystal structure features to generate joint multimodal representations. This provides a more comprehensive perspective for predicting the performance of crystalline materials, enabling the model to consider both microstructural composition and chemical characteristics of the materials. We conducted experiments on the Materials Project benchmark dataset, which showed leading performance in the bandgap prediction task and achieved results on a par with existing benchmarks in the formation energy prediction task.
Abstract:Vision-language pre-training (VLP) models, trained on large-scale image-text pairs, have become widely used across a variety of downstream vision-and-language (V+L) tasks. This widespread adoption raises concerns about their vulnerability to adversarial attacks. Non-universal adversarial attacks, while effective, are often impractical for real-time online applications due to their high computational demands per data instance. Recently, universal adversarial perturbations (UAPs) have been introduced as a solution, but existing generator-based UAP methods are significantly time-consuming. To overcome the limitation, we propose a direct optimization-based UAP approach, termed DO-UAP, which significantly reduces resource consumption while maintaining high attack performance. Specifically, we explore the necessity of multimodal loss design and introduce a useful data augmentation strategy. Extensive experiments conducted on three benchmark VLP datasets, six popular VLP models, and three classical downstream tasks demonstrate the efficiency and effectiveness of DO-UAP. Specifically, our approach drastically decreases the time consumption by 23-fold while achieving a better attack performance.
Abstract:The performance of disturbance observers is strongly influenced by the level of prior knowledge about the disturbance model. The simultaneous input and state estimation (SISE) algorithm is widely recognized for providing unbiased minimum-variance estimates under arbitrary disturbance models. In contrast, the Kalman filter-based disturbance observer (KF-DOB) achieves minimum mean-square error estimation when the disturbance model is fully specified. However, practical scenarios often fall between these extremes, where only partial knowledge of the disturbance model is available. This paper investigates the inherent bias-variance trade-off in KF-DOB when the disturbance model is incomplete. We further show that SISE can be interpreted as a special case of KF-DOB, where the disturbance noise covariance tends to infinity. To address this trade-off, we propose two novel estimators: the multi-kernel correntropy Kalman filter-based disturbance observer (MKCKF-DOB) and the interacting multiple models Kalman filter-based disturbance observer (IMMKF-DOB). Simulations verify the effectiveness of the proposed methods.
Abstract:The Coastal underwater evidence search system with surface-underwater collaboration is designed to revolutionize the search for artificial objects in coastal underwater environments, overcoming limitations associated with traditional methods such as divers and tethered remotely operated vehicles. Our innovative multi-robot collaborative system consists of three parts, an autonomous surface vehicle as a mission control center, a towed underwater vehicle for wide-area search, and a biomimetic underwater robot inspired by marine organisms for detailed inspections of identified areas. We conduct extensive simulations and real-world experiments in pond environments and coastal fields to demonstrate the system potential to surpass the limitations of conventional underwater search methods, offering a robust and efficient solution for law enforcement and recovery operations in marine settings.