Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Xuan Chen

IntenTest: Stress Testing for Intent Integrity in API-Calling LLM Agents

Jun 09, 2025

Shiwei Feng, Xiangzhe Xu, Xuan Chen, Kaiyuan Zhang, Syed Yusuf Ahmed, Zian Su, Mingwei Zheng, Xiangyu Zhang

Abstract:LLM agents are increasingly deployed to automate real-world tasks by invoking APIs through natural language instructions. While powerful, they often suffer from misinterpretation of user intent, leading to the agent's actions that diverge from the user's intended goal, especially as external toolkits evolve. Traditional software testing assumes structured inputs and thus falls short in handling the ambiguity of natural language. We introduce IntenTest, an API-centric stress testing framework that systematically uncovers intent integrity violations in LLM agents. Unlike prior work focused on fixed benchmarks or adversarial inputs, IntenTest generates realistic tasks based on toolkits' documentation and applies targeted mutations to expose subtle agent errors while preserving user intent. To guide testing, we propose semantic partitioning, which organizes natural language tasks into meaningful categories based on toolkit API parameters and their equivalence classes. Within each partition, seed tasks are mutated and ranked by a lightweight predictor that estimates the likelihood of triggering agent errors. To enhance efficiency, IntenTest maintains a datatype-aware strategy memory that retrieves and adapts effective mutation patterns from past cases. Experiments on 80 toolkit APIs demonstrate that IntenTest effectively uncovers intent integrity violations, significantly outperforming baselines in both error-exposing rate and query efficiency. Moreover, IntenTest generalizes well to stronger target models using smaller LLMs for test generation, and adapts to evolving APIs across domains.

Via

Access Paper or Ask Questions

LLM-Align: Utilizing Large Language Models for Entity Alignment in Knowledge Graphs

Dec 06, 2024

Xuan Chen, Tong Lu, Zhichun Wang

Abstract:Entity Alignment (EA) seeks to identify and match corresponding entities across different Knowledge Graphs (KGs), playing a crucial role in knowledge fusion and integration. Embedding-based entity alignment (EA) has recently gained considerable attention, resulting in the emergence of many innovative approaches. Initially, these approaches concentrated on learning entity embeddings based on the structural features of knowledge graphs (KGs) as defined by relation triples. Subsequent methods have integrated entities' names and attributes as supplementary information to improve the embeddings used for EA. However, existing methods lack a deep semantic understanding of entity attributes and relations. In this paper, we propose a Large Language Model (LLM) based Entity Alignment method, LLM-Align, which explores the instruction-following and zero-shot capabilities of Large Language Models to infer alignments of entities. LLM-Align uses heuristic methods to select important attributes and relations of entities, and then feeds the selected triples of entities to an LLM to infer the alignment results. To guarantee the quality of alignment results, we design a multi-round voting mechanism to mitigate the hallucination and positional bias issues that occur with LLMs. Experiments on three EA datasets, demonstrating that our approach achieves state-of-the-art performance compared to existing EA methods.

Via

Access Paper or Ask Questions

DIGIMON: Diagnosis and Mitigation of Sampling Skew for Reinforcement Learning based Meta-Planner in Robot Navigation

Sep 17, 2024

Shiwei Feng, Xuan Chen, Zhiyuan Cheng, Zikang Xiong, Yifei Gao, Siyuan Cheng, Sayali Kate, Xiangyu Zhang

Abstract:Robot navigation is increasingly crucial across applications like delivery services and warehouse management. The integration of Reinforcement Learning (RL) with classical planning has given rise to meta-planners that combine the adaptability of RL with the explainable decision-making of classical planners. However, the exploration capabilities of RL-based meta-planners during training are often constrained by the capabilities of the underlying classical planners. This constraint can result in limited exploration, thereby leading to sampling skew issues. To address these issues, our paper introduces a novel framework, DIGIMON, which begins with behavior-guided diagnosis for exploration bottlenecks within the meta-planner and follows up with a mitigation strategy that conducts up-sampling from diagnosed bottleneck data. Our evaluation shows 13.5%+ improvement in navigation performance, greater robustness in out-of-distribution environments, and a 4x boost in training efficiency. DIGIMON is designed as a versatile, plug-and-play solution, allowing seamless integration into various RL-based meta-planners.

Via

Access Paper or Ask Questions

DERA: Dense Entity Retrieval for Entity Alignment in Knowledge Graphs

Aug 02, 2024

Zhichun Wang, Xuan Chen

Figure 1 for DERA: Dense Entity Retrieval for Entity Alignment in Knowledge Graphs

Figure 2 for DERA: Dense Entity Retrieval for Entity Alignment in Knowledge Graphs

Figure 3 for DERA: Dense Entity Retrieval for Entity Alignment in Knowledge Graphs

Figure 4 for DERA: Dense Entity Retrieval for Entity Alignment in Knowledge Graphs

Abstract:Entity Alignment (EA) aims to match equivalent entities in different Knowledge Graphs (KGs), which is essential for knowledge fusion and integration. Recently, embedding-based EA has attracted significant attention and many approaches have been proposed. Early approaches primarily focus on learning entity embeddings from the structural features of KGs, defined by relation triples. Later methods incorporated entities' names and attributes as auxiliary information to enhance embeddings for EA. However, these approaches often used different techniques to encode structural and attribute information, limiting their interaction and mutual enhancement. In this work, we propose a dense entity retrieval framework for EA, leveraging language models to uniformly encode various features of entities and facilitate nearest entity search across KGs. Alignment candidates are first generated through entity retrieval, which are subsequently reranked to determine the final alignments. We conduct comprehensive experiments on both cross-lingual and monolingual EA datasets, demonstrating that our approach achieves state-of-the-art performance compared to existing EA methods.

Via

Access Paper or Ask Questions

ParaFuzz: An Interpretability-Driven Technique for Detecting Poisoned Samples in NLP

Aug 04, 2023

Lu Yan, Zhuo Zhang, Guanhong Tao, Kaiyuan Zhang, Xuan Chen, Guangyu Shen, Xiangyu Zhang

Abstract:Backdoor attacks have emerged as a prominent threat to natural language processing (NLP) models, where the presence of specific triggers in the input can lead poisoned models to misclassify these inputs to predetermined target classes. Current detection mechanisms are limited by their inability to address more covert backdoor strategies, such as style-based attacks. In this work, we propose an innovative test-time poisoned sample detection framework that hinges on the interpretability of model predictions, grounded in the semantic meaning of inputs. We contend that triggers (e.g., infrequent words) are not supposed to fundamentally alter the underlying semantic meanings of poisoned samples as they want to stay stealthy. Based on this observation, we hypothesize that while the model's predictions for paraphrased clean samples should remain stable, predictions for poisoned samples should revert to their true labels upon the mutations applied to triggers during the paraphrasing process. We employ ChatGPT, a state-of-the-art large language model, as our paraphraser and formulate the trigger-removal task as a prompt engineering problem. We adopt fuzzing, a technique commonly used for unearthing software vulnerabilities, to discover optimal paraphrase prompts that can effectively eliminate triggers while concurrently maintaining input semantics. Experiments on 4 types of backdoor attacks, including the subtle style backdoors, and 4 distinct datasets demonstrate that our approach surpasses baseline methods, including STRIP, RAP, and ONION, in precision and recall.

Via

Access Paper or Ask Questions

Feature Noise Boosts DNN Generalization under Label Noise

Aug 03, 2023

Lu Zeng, Xuan Chen, Xiaoshuang Shi, Heng Tao Shen

Figure 1 for Feature Noise Boosts DNN Generalization under Label Noise

Figure 2 for Feature Noise Boosts DNN Generalization under Label Noise

Figure 3 for Feature Noise Boosts DNN Generalization under Label Noise

Figure 4 for Feature Noise Boosts DNN Generalization under Label Noise

Abstract:The presence of label noise in the training data has a profound impact on the generalization of deep neural networks (DNNs). In this study, we introduce and theoretically demonstrate a simple feature noise method, which directly adds noise to the features of training data, can enhance the generalization of DNNs under label noise. Specifically, we conduct theoretical analyses to reveal that label noise leads to weakened DNN generalization by loosening the PAC-Bayes generalization bound, and feature noise results in better DNN generalization by imposing an upper bound on the mutual information between the model weights and the features, which constrains the PAC-Bayes generalization bound. Furthermore, to ensure effective generalization of DNNs in the presence of label noise, we conduct application analyses to identify the optimal types and levels of feature noise to add for obtaining desirable label noise generalization. Finally, extensive experimental results on several popular datasets demonstrate the feature noise method can significantly enhance the label noise generalization of the state-of-the-art label noise method.

Via

Access Paper or Ask Questions

Low Complexity First: Duration-Centric ISI Mitigation in Molecular Communication via Diffusion

Jul 19, 2022

Xuan Chen, Fei Ji, Miaowen Wen, Yu Huang, Yuankun Tang, Andrew W. Eckford

Figure 1 for Low Complexity First: Duration-Centric ISI Mitigation in Molecular Communication via Diffusion

Figure 2 for Low Complexity First: Duration-Centric ISI Mitigation in Molecular Communication via Diffusion

Figure 3 for Low Complexity First: Duration-Centric ISI Mitigation in Molecular Communication via Diffusion

Figure 4 for Low Complexity First: Duration-Centric ISI Mitigation in Molecular Communication via Diffusion

Abstract:In this paper, we propose a novel inter-symbol interference (ISI) mitigation scheme for molecular communication via diffusion (MCvD) systems with the optimal detection interval. Its rationale is to exploit the discarded duration (i.e., the symbol duration outside this optimal interval) to relieve ISI in the target system. Following this idea, we formulate an objective function to quantify the impact of the discarded time on bit error rate (BER) performance. Besides, an optimally reusable interval within the discarded duration is derived in closed form, which applies to both the absorbing and passive receivers. Finally, numerical results validate our analysis and show that for the considered MCvD system, significant BER improvements can be achieved by using the derived reusable duration.

* 5 pages, 5 figures. arXiv admin note: text overlap with arXiv:2204.08636

Via

Access Paper or Ask Questions

Detection Interval for Diffusion Molecular Communication: How Long is Enough?

Apr 19, 2022

Xuan Chen, Miaowen Wen, Fei Ji, Yu Huang, Yuankun Tang, Andrew W. Eckford

Figure 1 for Detection Interval for Diffusion Molecular Communication: How Long is Enough?

Figure 2 for Detection Interval for Diffusion Molecular Communication: How Long is Enough?

Figure 3 for Detection Interval for Diffusion Molecular Communication: How Long is Enough?

Figure 4 for Detection Interval for Diffusion Molecular Communication: How Long is Enough?

Abstract:Molecular communication has a key role to play in future medical applications, including detecting, analyzing, and addressing infectious disease outbreaks. Overcoming inter-symbol interference (ISI) is one of the key challenges in the design of molecular communication systems. In this paper, we propose to optimize the detection interval to minimize the impact of ISI while ensuring the accurate detection of the transmitted information symbol, which is suitable for the absorbing and passive receivers. For tractability, based on the signal-to-interference difference (SID) and signal-to-interference-and-noise amplitude ratio (SINAR), we propose a modified-SINAR (mSINAR) to measure the bit error rate (BER) performance for the molecular communication system with a variable detection interval. Besides, we derive the optimal detection interval in closed form. Using simulation results, we show that the BER performance of our proposed mSINAR scheme is superior to the competing schemes, and achieves similar performance to optimal intervals found by the exhaustive search.

Via

Access Paper or Ask Questions

Towards Behavior-Level Explanation for Deep Reinforcement Learning

Sep 17, 2020

Xuan Chen, Zifan Wang, Yucai Fan, Bonan Jin, Piotr Mardziel, Carlee Joe-Wong, Anupam Datta

Figure 1 for Towards Behavior-Level Explanation for Deep Reinforcement Learning

Figure 2 for Towards Behavior-Level Explanation for Deep Reinforcement Learning

Figure 3 for Towards Behavior-Level Explanation for Deep Reinforcement Learning

Figure 4 for Towards Behavior-Level Explanation for Deep Reinforcement Learning

Abstract:While Deep Neural Networks (DNNs) are becoming the state-of-the-art for many tasks including reinforcement learning (RL), they are especially resistant to human scrutiny and understanding. Input attributions have been a foundational building block for DNN expalainabilty but face new challenges when applied to deep RL. We address the challenges with two novel techniques. We define a class of \emph{behaviour-level attributions} for explaining agent behaviour beyond input importance and interpret existing attribution methods on the behaviour level. We then introduce \emph{$\lambda$-alignment}, a metric for evaluating the performance of behaviour-level attributions methods in terms of whether they are indicative of the agent actions they are meant to explain. Our experiments on Atari games suggest that perturbation-based attribution methods are significantly more suitable to deep RL than alternatives from the perspective of this metric. We argue that our methods demonstrate the minimal set of considerations for adopting general DNN explanation technology to the unique aspects of reinforcement learning and hope the outlined direction can serve as a basis for future research on understanding Deep RL using attribution.

Via

Access Paper or Ask Questions