Abstract:Object detection models are vulnerable to backdoor attacks, where attackers poison a small subset of training samples by embedding a predefined trigger to manipulate prediction. Detecting poisoned samples (i.e., those containing triggers) at test time can prevent backdoor activation. However, unlike image classification tasks, the unique characteristics of object detection -- particularly its output of numerous objects -- pose fresh challenges for backdoor detection. The complex attack effects (e.g., "ghost" object emergence or "vanishing" object) further render current defenses fundamentally inadequate. To this end, we design TRAnsformation Consistency Evaluation (TRACE), a brand-new method for detecting poisoned samples at test time in object detection. Our journey begins with two intriguing observations: (1) poisoned samples exhibit significantly more consistent detection results than clean ones across varied backgrounds. (2) clean samples show higher detection consistency when introduced to different focal information. Based on these phenomena, TRACE applies foreground and background transformations to each test sample, then assesses transformation consistency by calculating the variance in objects confidences. TRACE achieves black-box, universal backdoor detection, with extensive experiments showing a 30% improvement in AUROC over state-of-the-art defenses and resistance to adaptive attacks.
Abstract:Can we simulate a sandbox society with generative agents to model human behavior, thereby reducing the over-reliance on real human trials for assessing public policies? In this work, we investigate the feasibility of simulating health-related decision-making, using vaccine hesitancy, defined as the delay in acceptance or refusal of vaccines despite the availability of vaccination services (MacDonald, 2015), as a case study. To this end, we introduce the VacSim framework with 100 generative agents powered by Large Language Models (LLMs). VacSim simulates vaccine policy outcomes with the following steps: 1) instantiate a population of agents with demographics based on census data; 2) connect the agents via a social network and model vaccine attitudes as a function of social dynamics and disease-related information; 3) design and evaluate various public health interventions aimed at mitigating vaccine hesitancy. To align with real-world results, we also introduce simulation warmup and attitude modulation to adjust agents' attitudes. We propose a series of evaluations to assess the reliability of various LLM simulations. Experiments indicate that models like Llama and Qwen can simulate aspects of human behavior but also highlight real-world alignment challenges, such as inconsistent responses with demographic profiles. This early exploration of LLM-driven simulations is not meant to serve as definitive policy guidance; instead, it serves as a call for action to examine social simulation for policy development.
Abstract:In this paper, we introduce a kinodynamic model predictive control (MPC) framework that exploits unidirectional parallel springs (UPS) to improve the energy efficiency of dynamic legged robots. The proposed method employs a hierarchical control structure, where the solution of MPC with simplified dynamic models is used to warm-start the kinodynamic MPC, which accounts for nonlinear centroidal dynamics and kinematic constraints. The proposed approach enables energy efficient dynamic hopping on legged robots by using UPS to reduce peak motor torques and energy consumption during stance phases. Simulation results demonstrated a 38.8% reduction in the cost of transport (CoT) for a monoped robot equipped with UPS during high-speed hopping. Additionally, preliminary hardware experiments show a 14.8% reduction in energy consumption. Video: https://youtu.be/AF11qMXJD48
Abstract:We investigate double descent and scaling laws in terms of weights rather than the number of parameters. Specifically, we analyze linear and random features models using the deterministic equivalence approach from random matrix theory. We precisely characterize how the weights norm concentrate around deterministic quantities and elucidate the relationship between the expected test error and the norm-based capacity (complexity). Our results rigorously answer whether double descent exists under norm-based capacity and reshape the corresponding scaling laws. Moreover, they prompt a rethinking of the data-parameter paradigm - from under-parameterized to over-parameterized regimes - by shifting the focus to norms (weights) rather than parameter count.
Abstract:As deep neural networks (DNNs) are widely applied in the physical world, many researches are focusing on physical-world adversarial examples (PAEs), which introduce perturbations to inputs and cause the model's incorrect outputs. However, existing PAEs face two challenges: unsatisfactory attack performance (i.e., poor transferability and insufficient robustness to environment conditions), and difficulty in balancing attack effectiveness with stealthiness, where better attack effectiveness often makes PAEs more perceptible. In this paper, we explore a novel perturbation-based method to overcome the challenges. For the first challenge, we introduce a strategy Deceptive RF injection based on robust features (RFs) that are predictive, robust to perturbations, and consistent across different models. Specifically, it improves the transferability and robustness of PAEs by covering RFs of other classes onto the predictive features in clean images. For the second challenge, we introduce another strategy Adversarial Semantic Pattern Minimization, which removes most perturbations and retains only essential adversarial patterns in AEsBased on the two strategies, we design our method Robust Feature Coverage Attack (RFCoA), comprising Robust Feature Disentanglement and Adversarial Feature Fusion. In the first stage, we extract target class RFs in feature space. In the second stage, we use attention-based feature fusion to overlay these RFs onto predictive features of clean images and remove unnecessary perturbations. Experiments show our method's superior transferability, robustness, and stealthiness compared to existing state-of-the-art methods. Additionally, our method's effectiveness can extend to Large Vision-Language Models (LVLMs), indicating its potential applicability to more complex tasks.
Abstract:With the significant advancement of Large Vision-Language Models (VLMs), concerns about their potential misuse and abuse have grown rapidly. Previous studies have highlighted VLMs' vulnerability to jailbreak attacks, where carefully crafted inputs can lead the model to produce content that violates ethical and legal standards. However, existing methods struggle against state-of-the-art VLMs like GPT-4o, due to the over-exposure of harmful content and lack of stealthy malicious guidance. In this work, we propose a novel jailbreak attack framework: Multi-Modal Linkage (MML) Attack. Drawing inspiration from cryptography, MML utilizes an encryption-decryption process across text and image modalities to mitigate over-exposure of malicious information. To align the model's output with malicious intent covertly, MML employs a technique called "evil alignment", framing the attack within a video game production scenario. Comprehensive experiments demonstrate MML's effectiveness. Specifically, MML jailbreaks GPT-4o with attack success rates of 97.80% on SafeBench, 98.81% on MM-SafeBench and 99.07% on HADES-Dataset. Our code is available at https://github.com/wangyu-ovo/MML
Abstract:Accurate origin-destination (OD) passenger flow prediction is crucial for enhancing metro system efficiency, optimizing scheduling, and improving passenger experiences. However, current models often fail to effectively capture the asynchronous departure characteristics of OD flows and underutilize the inflow and outflow data, which limits their prediction accuracy. To address these issues, we propose CSP-AIT-Net, a novel spatiotemporal graph attention framework designed to enhance OD flow prediction by incorporating asynchronous inflow tracking and advanced station semantics representation. Our framework restructures the OD flow prediction paradigm by first predicting outflows and then decomposing OD flows using a spatiotemporal graph attention mechanism. To enhance computational efficiency, we introduce a masking mechanism and propose asynchronous passenger flow graphs that integrate inflow and OD flow with conservation constraints. Furthermore, we employ contrastive learning to extract high-dimensional land use semantics of metro stations, enriching the contextual understanding of passenger mobility patterns. Validation of the Shanghai metro system demonstrates improvement in short-term OD flow prediction accuracy over state-of-the-art methods. This work contributes to enhancing metro operational efficiency, scheduling precision, and overall system safety.
Abstract:With strong expressive capabilities in Large Language Models(LLMs), generative models effectively capture sentiment structures and deep semantics, however, challenges remain in fine-grained sentiment classification across multi-lingual and complex contexts. To address this, we propose the Sentiment Cross-Lingual Recognition and Logic Framework (SentiXRL), which incorporates two modules,an emotion retrieval enhancement module to improve sentiment classification accuracy in complex contexts through historical dialogue and logical reasoning,and a self-circulating analysis negotiation mechanism (SANM)to facilitates autonomous decision-making within a single model for classification tasks.We have validated SentiXRL's superiority on multiple standard datasets, outperforming existing models on CPED and CH-SIMS,and achieving overall better performance on MELD,Emorynlp and IEMOCAP. Notably, we unified labels across several fine-grained sentiment annotation datasets and conducted category confusion experiments, revealing challenges and impacts of class imbalance in standard datasets.
Abstract:In recent years, graph representation learning has undergone a paradigm shift, driven by the emergence and proliferation of graph neural networks (GNNs) and their heterogeneous counterparts. Heterogeneous GNNs have shown remarkable success in extracting low-dimensional embeddings from complex graphs that encompass diverse entity types and relationships. While meta-path-based techniques have long been recognized for their ability to capture semantic affinities among nodes, their dependence on manual specification poses a significant limitation. In contrast, matrix-focused methods accelerate processing by utilizing structural cues but often overlook contextual richness. In this paper, we challenge the current paradigm by introducing ontology as a fundamental semantic primitive within complex graphs. Our goal is to integrate the strengths of both matrix-centric and meta-path-based approaches into a unified framework. We propose perturbation Ontology-based Graph Attention Networks (POGAT), a novel methodology that combines ontology subgraphs with an advanced self-supervised learning paradigm to achieve a deep contextual understanding. The core innovation of POGAT lies in our enhanced homogeneous perturbing scheme designed to generate rigorous negative samples, encouraging the model to explore minimal contextual features more thoroughly. Through extensive empirical evaluations, we demonstrate that POGAT significantly outperforms state-of-the-art baselines, achieving a groundbreaking improvement of up to 10.78\% in F1-score for the critical task of link prediction and 12.01\% in Micro-F1 for the critical task of node classification.
Abstract:Freight truck-related crashes pose significant challenges, leading to substantial economic losses, injuries, and fatalities, with pronounced spatial disparities across different regions. This study adopts a transport geography perspective to examine spatial justice concerns by employing deep counterfactual inference models to analyze how socioeconomic disparities, road infrastructure, and environmental conditions influence the geographical distribution and severity of freight truck crashes. By integrating road network datasets, socioeconomic attributes, and crash records from the Los Angeles metropolitan area, this research provides a nuanced spatial analysis of how different communities are disproportionately impacted. The results reveal significant spatial disparities in crash severity across areas with varying population densities, income levels, and minority populations, highlighting the pivotal role of infrastructural and environmental improvements in mitigating these disparities. The findings offer insights into targeted, location-specific policy interventions, suggesting enhancements in road infrastructure, lighting, and traffic control systems, particularly in low-income and minority-concentrated areas. This research contributes to the literature on transport geography and spatial equity by providing data-driven insights into effective measures for reducing spatial injustices associated with freight truck-related crashes.