Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

He Chen

Advancing Embodied Agent Security: From Safety Benchmarks to Input Moderation

Apr 22, 2025

Ning Wang, Zihan Yan, Weiyang Li, Chuan Ma, He Chen, Tao Xiang

Abstract:Embodied agents exhibit immense potential across a multitude of domains, making the assurance of their behavioral safety a fundamental prerequisite for their widespread deployment. However, existing research predominantly concentrates on the security of general large language models, lacking specialized methodologies for establishing safety benchmarks and input moderation tailored to embodied agents. To bridge this gap, this paper introduces a novel input moderation framework, meticulously designed to safeguard embodied agents. This framework encompasses the entire pipeline, including taxonomy definition, dataset curation, moderator architecture, model training, and rigorous evaluation. Notably, we introduce EAsafetyBench, a meticulously crafted safety benchmark engineered to facilitate both the training and stringent assessment of moderators specifically designed for embodied agents. Furthermore, we propose Pinpoint, an innovative prompt-decoupled input moderation scheme that harnesses a masked attention mechanism to effectively isolate and mitigate the influence of functional prompts on moderation tasks. Extensive experiments conducted on diverse benchmark datasets and models validate the feasibility and efficacy of the proposed approach. The results demonstrate that our methodologies achieve an impressive average detection accuracy of 94.58%, surpassing the performance of existing state-of-the-art techniques, alongside an exceptional moderation processing time of merely 0.002 seconds per instance.

* 9 pages

Via

Access Paper or Ask Questions

EarthGPT-X: Enabling MLLMs to Flexibly and Comprehensively Understand Multi-Source Remote Sensing Imagery

Apr 17, 2025

Wei Zhang, Miaoxin Cai, Yaqian Ning, Tong Zhang, Yin Zhuang, He Chen, Jun Li, Xuerui Mao

Abstract:Recent advances in the visual-language area have developed natural multi-modal large language models (MLLMs) for spatial reasoning through visual prompting. However, due to remote sensing (RS) imagery containing abundant geospatial information that differs from natural images, it is challenging to effectively adapt natural spatial models to the RS domain. Moreover, current RS MLLMs are limited in overly narrow interpretation levels and interaction manner, hindering their applicability in real-world scenarios. To address those challenges, a spatial MLLM named EarthGPT-X is proposed, enabling a comprehensive understanding of multi-source RS imagery, such as optical, synthetic aperture radar (SAR), and infrared. EarthGPT-X offers zoom-in and zoom-out insight, and possesses flexible multi-grained interactive abilities. Moreover, EarthGPT-X unifies two types of critical spatial tasks (i.e., referring and grounding) into a visual prompting framework. To achieve these versatile capabilities, several key strategies are developed. The first is the multi-modal content integration method, which enhances the interplay between images, visual prompts, and text instructions. Subsequently, a cross-domain one-stage fusion training strategy is proposed, utilizing the large language model (LLM) as a unified interface for multi-source multi-task learning. Furthermore, by incorporating a pixel perception module, the referring and grounding tasks are seamlessly unified within a single framework. In addition, the experiments conducted demonstrate the superiority of the proposed EarthGPT-X in multi-grained tasks and its impressive flexibility in multi-modal interaction, revealing significant advancements of MLLM in the RS field.

Via

Access Paper or Ask Questions

FuseGrasp: Radar-Camera Fusion for Robotic Grasping of Transparent Objects

Feb 28, 2025

Hongyu Deng, Tianfan Xue, He Chen

Abstract:Transparent objects are prevalent in everyday environments, but their distinct physical properties pose significant challenges for camera-guided robotic arms. Current research is mainly dependent on camera-only approaches, which often falter in suboptimal conditions, such as low-light environments. In response to this challenge, we present FuseGrasp, the first radar-camera fusion system tailored to enhance the transparent objects manipulation. FuseGrasp exploits the weak penetrating property of millimeter-wave (mmWave) signals, which causes transparent materials to appear opaque, and combines it with the precise motion control of a robotic arm to acquire high-quality mmWave radar images of transparent objects. The system employs a carefully designed deep neural network to fuse radar and camera imagery, thereby improving depth completion and elevating the success rate of object grasping. Nevertheless, training FuseGrasp effectively is non-trivial, due to limited radar image datasets for transparent objects. We address this issue utilizing large RGB-D dataset, and propose an effective two-stage training approach: we first pre-train FuseGrasp on a large public RGB-D dataset of transparent objects, then fine-tune it on a self-built small RGB-D-Radar dataset. Furthermore, as a byproduct, FuseGrasp can determine the composition of transparent objects, such as glass or plastic, leveraging the material identification capability of mmWave radar. This identification result facilitates the robotic arm in modulating its grip force appropriately. Extensive testing reveals that FuseGrasp significantly improves the accuracy of depth reconstruction and material identification for transparent objects. Moreover, real-world robotic trials have confirmed that FuseGrasp markedly enhances the handling of transparent items. A video demonstration of FuseGrasp is available at https://youtu.be/MWDqv0sRSok.

* 16 pages, 20 figures, accepted by IEEE TMC

Via

Access Paper or Ask Questions

DeepCRF: Deep Learning-Enhanced CSI-Based RF Fingerprinting for Channel-Resilient WiFi Device Identification

Nov 11, 2024

Ruiqi Kong, He Chen

Figure 1 for DeepCRF: Deep Learning-Enhanced CSI-Based RF Fingerprinting for Channel-Resilient WiFi Device Identification

Figure 2 for DeepCRF: Deep Learning-Enhanced CSI-Based RF Fingerprinting for Channel-Resilient WiFi Device Identification

Figure 3 for DeepCRF: Deep Learning-Enhanced CSI-Based RF Fingerprinting for Channel-Resilient WiFi Device Identification

Figure 4 for DeepCRF: Deep Learning-Enhanced CSI-Based RF Fingerprinting for Channel-Resilient WiFi Device Identification

Abstract:This paper presents DeepCRF, a new framework that harnesses deep learning to extract subtle micro-signals from channel state information (CSI) measurements, enabling robust and resilient radio-frequency fingerprinting (RFF) of commercial-off-the-shelf (COTS) WiFi devices across diverse channel conditions. Building on our previous research, which demonstrated that micro-signals in CSI, termed micro-CSI, most likely originate from RF circuitry imperfections and can serve as unique RF fingerprints, we develop a new approach to overcome the limitations of our prior signal space-based method. While the signal space-based method is effective in strong line-of-sight (LoS) conditions, we show that it struggles with the complexities of non-line-of-sight (NLoS) scenarios, compromising the robustness of CSI-based RFF. To address this challenge, DeepCRF incorporates a carefully trained convolutional neural network (CNN) with model-inspired data augmentation, supervised contrastive learning, and decision fusion techniques, enhancing its generalization capabilities across unseen channel conditions and resilience against noise. Our evaluations demonstrate that DeepCRF significantly improves device identification accuracy across diverse channels, outperforming both the signal space-based baseline and state-of-the-art neural network-based benchmarks. Notably, it achieves an average identification accuracy of 99.53% among 19 COTS WiFi network interface cards in real-world unseen scenarios using 4 CSI measurements per identification procedure.

Via

Access Paper or Ask Questions

Spurious Stationarity and Hardness Results for Mirror Descent

Apr 11, 2024

He Chen, Jiajin Li, Anthony Man-Cho So

Figure 1 for Spurious Stationarity and Hardness Results for Mirror Descent

Abstract:Despite the considerable success of Bregman proximal-type algorithms, such as mirror descent, in machine learning, a critical question remains: Can existing stationarity measures, often based on Bregman divergence, reliably distinguish between stationary and non-stationary points? In this paper, we present a groundbreaking finding: All existing stationarity measures necessarily imply the existence of spurious stationary points. We further establish an algorithmic independent hardness result: Bregman proximal-type algorithms are unable to escape from a spurious stationary point in finite steps when the initial point is unfavorable, even for convex problems. Our hardness result points out the inherent distinction between Euclidean and Bregman geometries, and introduces both fundamental theoretical and numerical challenges to both machine learning and optimization communities.

Via

Access Paper or Ask Questions

Towards Channel-Resilient CSI-Based RF Fingerprinting using Deep Learning

Mar 23, 2024

Ruiqi Kong, He Chen

Abstract:This work introduces DeepCRF, a deep learning framework designed for channel state information-based radio frequency fingerprinting (CSI-RFF). The considered CSI-RFF is built on micro-CSI, a recently discovered radio-frequency (RF) fingerprint that manifests as micro-signals appearing on the channel state information (CSI) curves of commercial WiFi devices. Micro-CSI facilitates CSI-RFF which is more streamlined and easily implementable compared to existing schemes that rely on raw I/Q samples. The primary challenge resides in the precise extraction of micro-CSI from the inherently fluctuating CSI measurements, a process critical for reliable RFF. The construction of a framework that is resilient to channel variability is essential for the practical deployment of CSI-RFF techniques. DeepCRF addresses this challenge with a thoughtfully trained convolutional neural network (CNN). This network's performance is significantly enhanced by employing effective and strategic data augmentation techniques, which bolster its ability to generalize to novel, unseen channel conditions. Furthermore, DeepCRF incorporates supervised contrastive learning to enhance its robustness against noises. Our evaluations demonstrate that DeepCRF significantly enhances the accuracy of device identification across previously unencountered channels. It outperforms both the conventional model-based methods and standard CNN that lack our specialized training and enhancement strategies.

* 6 pages,5 figures, INFOCOM WKSHPS 2024

Via

Access Paper or Ask Questions

Physical-Layer Authentication of Commodity Wi-Fi Devices via Micro-Signals on CSI Curves

Aug 01, 2023

Ruiqi Kong, He Chen

Abstract:This paper presents a new radiometric fingerprint that is revealed by micro-signals in the channel state information (CSI) curves extracted from commodity Wi-Fi devices. We refer to this new fingerprint as "micro-CSI". Our experiments show that micro-CSI is likely to be caused by imperfections in the radio-frequency circuitry and is present in Wi-Fi 4/5/6 network interface cards (NICs). We conducted further experiments to determine the most effective CSI collection configuration to stabilize micro-CSI. To extract micro-CSI from varying CSI curves, we developed a signal space-based extraction algorithm that effectively separates distortions caused by wireless channels and hardware imperfections under line-of-sight (LoS) scenarios. Finally, we implemented a micro-CSI-based device authentication algorithm that uses the k-Nearest Neighbors (KNN) method to identify 11 COTS Wi-Fi NICs from the same manufacturer in typical indoor environments. Our experimental results demonstrate that the micro-CSI-based authentication algorithm can achieve an average attack detection rate of over 99% with a false alarm rate of 0%.

* 5 pages, 3 figures, conference

Via

Access Paper or Ask Questions

RF-Based Simultaneous Localization and Source Seeking for Multi-Robot Systems

Jun 27, 2023

Ke Xu, Rui Zhang, He Chen

Abstract:This paper considers a radio-frequency (RF)-based simultaneous localization and source-seeking (SLASS) problem in multi-robot systems, where multiple robots jointly localize themselves and an RF source using distance-only measurements extracted from RF signals and then control themselves to approach the source. We design a Rao-Blackwellized particle filter-based algorithm to realize the joint localization of the robots and the source. We also devise an information-theoretic control policy for the robots to approach the source. In our control policy, we maximize the predicted mutual information between the source position and the distance measurements, conditioned on the robot positions, to incorporate the robot localization uncertainties. A projected gradient ascent method is adopted to solve the mutual information maximization problem. Simulation results show that the proposed SLASS framework outperforms two benchmarks in terms of the root mean square error (RMSE) of the estimated source position and the decline of the distances between the robots and the source, indicating more effective approaching of the robots to the source.

Via

Access Paper or Ask Questions

Pulse Shape-Aided Multipath Delay Estimation for Fine-Grained WiFi Sensing

Jun 27, 2023

Ke Xu, He Chen, Chenshu Wu

Abstract:Due to the finite bandwidth of practical wireless systems, one multipath component can manifest itself as a discrete pulse consisting of multiple taps in the digital delay domain. This effect is called channel leakage, which complicates the multipath delay estimation problem. In this paper, we develop a new algorithm to estimate multipath delays of leaked channels by leveraging the knowledge of pulse-shaping functions, which can be used to support fine-grained WiFi sensing applications. Specifically, we express the channel impulse response (CIR) as a linear combination of overcomplete basis vectors corresponding to different delays. Considering the limited number of paths in physical environments, we formulate the multipath delay estimation as a sparse recovery problem. We then propose a sparse Bayesian learning (SBL) method to estimate the sparse vector and determine the number of physical paths and their associated delay parameters from the positions of the nonzero entries in the sparse vector. Simulation results show that our algorithm can accurately determine the number of paths, and achieve superior accuracy in path delay estimation and channel reconstruction compared to two benchmarking schemes.

Via

Access Paper or Ask Questions

Consecutive Pretraining: A Knowledge Transfer Learning Strategy with Relevant Unlabeled Data for Remote Sensing Domain

Jul 08, 2022

Tong Zhang, Peng Gao, Hao Dong, Yin Zhuang, Guanqun Wang, Wei Zhang, He Chen

Figure 1 for Consecutive Pretraining: A Knowledge Transfer Learning Strategy with Relevant Unlabeled Data for Remote Sensing Domain

Figure 2 for Consecutive Pretraining: A Knowledge Transfer Learning Strategy with Relevant Unlabeled Data for Remote Sensing Domain

Figure 3 for Consecutive Pretraining: A Knowledge Transfer Learning Strategy with Relevant Unlabeled Data for Remote Sensing Domain

Figure 4 for Consecutive Pretraining: A Knowledge Transfer Learning Strategy with Relevant Unlabeled Data for Remote Sensing Domain

Abstract:Currently, under supervised learning, a model pretrained by a large-scale nature scene dataset and then fine-tuned on a few specific task labeling data is the paradigm that has dominated the knowledge transfer learning. It has reached the status of consensus solution for task-aware model training in remote sensing domain (RSD). Unfortunately, due to different categories of imaging data and stiff challenges of data annotation, there is not a large enough and uniform remote sensing dataset to support large-scale pretraining in RSD. Moreover, pretraining models on large-scale nature scene datasets by supervised learning and then directly fine-tuning on diverse downstream tasks seems to be a crude method, which is easily affected by inevitable labeling noise, severe domain gaps and task-aware discrepancies. Thus, in this paper, considering the self-supervised pretraining and powerful vision transformer (ViT) architecture, a concise and effective knowledge transfer learning strategy called ConSecutive PreTraining (CSPT) is proposed based on the idea of not stopping pretraining in natural language processing (NLP), which can gradually bridge the domain gap and transfer knowledge from the nature scene domain to the RSD. The proposed CSPT also can release the huge potential of unlabeled data for task-aware model training. Finally, extensive experiments are carried out on twelve datasets in RSD involving three types of downstream tasks (e.g., scene classification, object detection and land cover classification) and two types of imaging data (e.g., optical and SAR). The results show that by utilizing the proposed CSPT for task-aware model training, almost all downstream tasks in RSD can outperform the previous method of supervised pretraining-then-fine-tuning and even surpass the state-of-the-art (SOTA) performance without any expensive labeling consumption and careful model design.

* 20 pages,9 figures

Via

Access Paper or Ask Questions