Victor
Abstract:Adversarial camouflage is a widely used physical attack against vehicle detectors for its superiority in multi-view attack performance. One promising approach involves using differentiable neural renderers to facilitate adversarial camouflage optimization through gradient back-propagation. However, existing methods often struggle to capture environmental characteristics during the rendering process or produce adversarial textures that can precisely map to the target vehicle. Moreover, these approaches neglect diverse weather conditions, reducing the efficacy of generated camouflage across varying weather scenarios. To tackle these challenges, we propose a robust and accurate camouflage generation method, namely RAUCA. The core of RAUCA is a novel neural rendering component, End-to-End Neural Renderer Plus (E2E-NRP), which can accurately optimize and project vehicle textures and render images with environmental characteristics such as lighting and weather. In addition, we integrate a multi-weather dataset for camouflage generation, leveraging the E2E-NRP to enhance the attack robustness. Experimental results on six popular object detectors show that RAUCA-final outperforms existing methods in both simulation and real-world settings.
Abstract:To tackle the challenges of large language model performance in natural language to SQL tasks, we introduce XiYan-SQL, an innovative framework that employs a multi-generator ensemble strategy to improve candidate generation. We introduce M-Schema, a semi-structured schema representation method designed to enhance the understanding of database structures. To enhance the quality and diversity of generated candidate SQL queries, XiYan-SQL integrates the significant potential of in-context learning (ICL) with the precise control of supervised fine-tuning. On one hand, we propose a series of training strategies to fine-tune models to generate high-quality candidates with diverse preferences. On the other hand, we implement the ICL approach with an example selection method based on named entity recognition to prevent overemphasis on entities. The refiner optimizes each candidate by correcting logical or syntactical errors. To address the challenge of identifying the best candidate, we fine-tune a selection model to distinguish nuances of candidate SQL queries. The experimental results on multiple dialect datasets demonstrate the robustness of XiYan-SQL in addressing challenges across different scenarios. Overall, our proposed XiYan-SQL achieves the state-of-the-art execution accuracy of 89.65% on the Spider test set, 69.86% on SQL-Eval, 41.20% on NL2GQL, and a competitive score of 72.23% on the Bird development benchmark. The proposed framework not only enhances the quality and diversity of SQL queries but also outperforms previous methods.
Abstract:Understanding the extent of urban flooding is crucial for assessing building damage, casualties and economic losses. Synthetic Aperture Radar (SAR) technology offers significant advantages for mapping flooded urban areas due to its ability to collect data regardless weather and solar illumination conditions. However, the wide range of existing methods makes it difficult to choose the best approach for a specific situation and to identify future research directions. Therefore, this study provides a comprehensive review of current research on urban flood mapping using SAR data, summarizing key characteristics of floodwater in SAR images and outlining various approaches from scientific articles. Additionally, we provide a brief overview of the advantages and disadvantages of each method category, along with guidance on selecting the most suitable approach for different scenarios. This study focuses on the challenges and advancements in SAR-based urban flood mapping. It specifically addresses the limitations of spatial and temporal resolution in SAR data and discusses the essential pre-processing steps. Moreover, the article explores the potential benefits of Polarimetric SAR (PolSAR) techniques and uncertainty analysis for future research. Furthermore, it highlights a lack of open-access SAR datasets for urban flood mapping, hindering development in advanced deep learning-based methods. Besides, we evaluated the Technology Readiness Levels (TRLs) of urban flood mapping techniques to identify challenges and future research areas. Finally, the study explores the practical applications of SAR-based urban flood mapping in both the private and public sectors and provides a comprehensive overview of the benefits and potential impact of these methods.
Abstract:Continual learning requires to overcome catastrophic forgetting when training a single model on a sequence of tasks. Recent top-performing approaches are prompt-based methods that utilize a set of learnable parameters (i.e., prompts) to encode task knowledge, from which appropriate ones are selected to guide the fixed pre-trained model in generating features tailored to a certain task. However, existing methods rely on predicting prompt identities for prompt selection, where the identity prediction process cannot be optimized with task loss. This limitation leads to sub-optimal prompt selection and inadequate adaptation of pre-trained features for a specific task. Previous efforts have tried to address this by directly generating prompts from input queries instead of selecting from a set of candidates. However, these prompts are continuous, which lack sufficient abstraction for task knowledge representation, making them less effective for continual learning. To address these challenges, we propose VQ-Prompt, a prompt-based continual learning method that incorporates Vector Quantization (VQ) into end-to-end training of a set of discrete prompts. In this way, VQ-Prompt can optimize the prompt selection process with task loss and meanwhile achieve effective abstraction of task knowledge for continual learning. Extensive experiments show that VQ-Prompt outperforms state-of-the-art continual learning methods across a variety of benchmarks under the challenging class-incremental setting. The code is available at \href{https://github.com/jiaolifengmi/VQ-Prompt}{this https URL}.
Abstract:The improvement in translating natural language to structured query language (SQL) can be attributed to the advancements in large language models (LLMs). Open-source LLMs, tailored for specific database dialects such as MySQL, have shown great performance. However, cloud service providers are looking for a unified database manager service (e.g., Cosmos DB from Azure, Amazon Aurora from AWS, Lindorm from AlibabaCloud) that can support multiple dialects. This requirement has led to the concept of multi-dialect query generation, which presents challenges to LLMs. These challenges include syntactic differences among dialects and imbalanced data distribution across multiple dialects. To tackle these challenges, we propose MoMQ, a novel Mixture-of-Experts-based multi-dialect query generation framework across both relational and non-relational databases. MoMQ employs a dialect expert group for each dialect and a multi-level routing strategy to handle dialect-specific knowledge, reducing interference during query generation. Additionally, a shared expert group is introduced to address data imbalance, facilitating the transfer of common knowledge from high-resource dialects to low-resource ones. Furthermore, we have developed a high-quality multi-dialect query generation benchmark that covers relational and non-relational databases such as MySQL, PostgreSQL, Cypher for Neo4j, and nGQL for NebulaGraph. Extensive experiments have shown that MoMQ performs effectively and robustly even in resource-imbalanced scenarios.
Abstract:With the extensive deployment of Large Language Models (LLMs), ensuring their safety has become increasingly critical. However, existing defense methods often struggle with two key issues: (i) inadequate defense capabilities, particularly in domain-specific scenarios like chemistry, where a lack of specialized knowledge can lead to the generation of harmful responses to malicious queries. (ii) over-defensiveness, which compromises the general utility and responsiveness of LLMs. To mitigate these issues, we introduce a multi-agents-based defense framework, Guide for Defense (G4D), which leverages accurate external information to provide an unbiased summary of user intentions and analytically grounded safety response guidance. Extensive experiments on popular jailbreak attacks and benign datasets show that our G4D can enhance LLM's robustness against jailbreak attacks on general and domain-specific scenarios without compromising the model's general functionality.
Abstract:Antibodies are proteins produced by the immune system that recognize and bind to specific antigens, and their 3D structures are crucial for understanding their binding mechanism and designing therapeutic interventions. The specificity of antibody-antigen binding predominantly depends on the complementarity-determining regions (CDR) within antibodies. Despite recent advancements in antibody structure prediction, the quality of predicted CDRs remains suboptimal. In this paper, we develop a novel antibody structure refinement method termed FlowAB based on energy-guided flow matching. FlowAB adopts the powerful deep generative method SE(3) flow matching and simultaneously incorporates important physical prior knowledge into the flow model to guide the generation process. The extensive experiments demonstrate that FlowAB can significantly improve the antibody CDR structures. It achieves new state-of-the-art performance on the antibody structure prediction task when used in conjunction with an appropriate prior model while incurring only marginal computational overhead. This advantage makes FlowAB a practical tool in antibody engineering.
Abstract:The increasing integration of large language models (LLMs) across various fields has heightened concerns about their potential to propagate dangerous information. This paper specifically explores the security vulnerabilities of LLMs within the field of chemistry, particularly their capacity to provide instructions for synthesizing hazardous substances. We evaluate the effectiveness of several prompt injection attack methods, including red-teaming, explicit prompting, and implicit prompting. Additionally, we introduce a novel attack technique named SMILES-prompting, which uses the Simplified Molecular-Input Line-Entry System (SMILES) to reference chemical substances. Our findings reveal that SMILES-prompting can effectively bypass current safety mechanisms. These findings highlight the urgent need for enhanced domain-specific safeguards in LLMs to prevent misuse and improve their potential for positive social impact.
Abstract:Generating emotionally appropriate responses in conversations with large language models presents a significant challenge due to the complexities of human emotions and cognitive processes, which remain largely underexplored in their critical role in social interactions. In this study, we introduce a two-stage automatic data generation framework to create CAPE, a Chinese dataset named Cognitive Appraisal theory-based Emotional corpus. This corpus facilitates the generation of dialogues with contextually appropriate emotional responses by accounting for diverse personal and situational factors. We propose two tasks utilizing this dataset: emotion prediction and next utterance prediction. Both automated and human evaluations demonstrate that agents trained on our dataset can deliver responses that are more aligned with human emotional expressions. Our study shows the potential for advancing emotional expression in conversational agents, paving the way for more nuanced and meaningful human-computer interactions.
Abstract:Recent advancements in Large Language Models (LLMs) have led to a rapid growth of agentic systems capable of handling a wide range of complex tasks. However, current research largely relies on manual, task-specific design, limiting their adaptability to novel tasks. In this paper, we introduce a new research problem: Modularized LLM Agent Search (MoLAS). We propose a modular design space that abstracts existing LLM agent designs into four fundamental modules with uniform IO interface: Planning, Reasoning, Tool Use, and Memory. Building on this design space, we present a novel LLM agent search framework called AgentSquare, which introduces two core mechanisms, i.e., module evolution and recombination, to efficiently search for optimized LLM agents. To further accelerate the process, we design a performance predictor that uses in-context surrogate models to skip unpromising agent designs. Extensive experiments across six benchmarks, covering the diverse scenarios of web, embodied, tool use and game applications, show that AgentSquare substantially outperforms hand-crafted agents, achieving an average performance gain of 17.2% against best-known human designs. Moreover, AgentSquare can generate interpretable design insights, enabling a deeper understanding of agentic architecture and its impact on task performance. We believe that the modular design space and AgentSquare search framework offer a platform for fully exploiting the potential of prior successful designs and consolidating the collective efforts of research community. Code repo is available at https://github.com/tsinghua-fib-lab/AgentSquare.