Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Shuai Zhou

Integrating Offline Pre-Training with Online Fine-Tuning: A Reinforcement Learning Approach for Robot Social Navigation

Oct 01, 2025

Run Su, Hao Fu, Shuai Zhou, Yingao Fu

Abstract:Offline reinforcement learning (RL) has emerged as a promising framework for addressing robot social navigation challenges. However, inherent uncertainties in pedestrian behavior and limited environmental interaction during training often lead to suboptimal exploration and distributional shifts between offline training and online deployment. To overcome these limitations, this paper proposes a novel offline-to-online fine-tuning RL algorithm for robot social navigation by integrating Return-to-Go (RTG) prediction into a causal Transformer architecture. Our algorithm features a spatiotem-poral fusion model designed to precisely estimate RTG values in real-time by jointly encoding temporal pedestrian motion patterns and spatial crowd dynamics. This RTG prediction framework mitigates distribution shift by aligning offline policy training with online environmental interactions. Furthermore, a hybrid offline-online experience sampling mechanism is built to stabilize policy updates during fine-tuning, ensuring balanced integration of pre-trained knowledge and real-time adaptation. Extensive experiments in simulated social navigation environments demonstrate that our method achieves a higher success rate and lower collision rate compared to state-of-the-art baselines. These results underscore the efficacy of our algorithm in enhancing navigation policy robustness and adaptability. This work paves the way for more reliable and adaptive robotic navigation systems in real-world applications.

Via

Access Paper or Ask Questions

Unleashing the Power of Pre-trained Encoders for Universal Adversarial Attack Detection

Apr 01, 2025

Yinghe Zhang, Chi Liu, Shuai Zhou, Sheng Shen, Peng Gui

Figure 1 for Unleashing the Power of Pre-trained Encoders for Universal Adversarial Attack Detection

Figure 2 for Unleashing the Power of Pre-trained Encoders for Universal Adversarial Attack Detection

Figure 3 for Unleashing the Power of Pre-trained Encoders for Universal Adversarial Attack Detection

Figure 4 for Unleashing the Power of Pre-trained Encoders for Universal Adversarial Attack Detection

Abstract:Adversarial attacks pose a critical security threat to real-world AI systems by injecting human-imperceptible perturbations into benign samples to induce misclassification in deep learning models. While existing detection methods, such as Bayesian uncertainty estimation and activation pattern analysis, have achieved progress through feature engineering, their reliance on handcrafted feature design and prior knowledge of attack patterns limits generalization capabilities and incurs high engineering costs. To address these limitations, this paper proposes a lightweight adversarial detection framework based on the large-scale pre-trained vision-language model CLIP. Departing from conventional adversarial feature characterization paradigms, we innovatively adopt an anomaly detection perspective. By jointly fine-tuning CLIP's dual visual-text encoders with trainable adapter networks and learnable prompts, we construct a compact representation space tailored for natural images. Notably, our detection architecture achieves substantial improvements in generalization capability across both known and unknown attack patterns compared to traditional methods, while significantly reducing training overhead. This study provides a novel technical pathway for establishing a parameter-efficient and attack-agnostic defense paradigm, markedly enhancing the robustness of vision systems against evolving adversarial threats.

Via

Access Paper or Ask Questions

Loosely Synchronized Rule-Based Planning for Multi-Agent Path Finding with Asynchronous Actions

Dec 16, 2024

Shuai Zhou, Shizhe Zhao, Zhongqiang Ren

Figure 1 for Loosely Synchronized Rule-Based Planning for Multi-Agent Path Finding with Asynchronous Actions

Figure 2 for Loosely Synchronized Rule-Based Planning for Multi-Agent Path Finding with Asynchronous Actions

Figure 3 for Loosely Synchronized Rule-Based Planning for Multi-Agent Path Finding with Asynchronous Actions

Figure 4 for Loosely Synchronized Rule-Based Planning for Multi-Agent Path Finding with Asynchronous Actions

Abstract:Multi-Agent Path Finding (MAPF) seeks collision-free paths for multiple agents from their respective starting locations to their respective goal locations while minimizing path costs. Although many MAPF algorithms were developed and can handle up to thousands of agents, they usually rely on the assumption that each action of the agent takes a time unit, and the actions of all agents are synchronized in a sense that the actions of agents start at the same discrete time step, which may limit their use in practice. Only a few algorithms were developed to address asynchronous actions, and they all lie on one end of the spectrum, focusing on finding optimal solutions with limited scalability. This paper develops new planners that lie on the other end of the spectrum, trading off solution quality for scalability, by finding an unbounded sub-optimal solution for many agents. Our method leverages both search methods (LSS) in handling asynchronous actions and rule-based planning methods (PIBT) for MAPF. We analyze the properties of our method and test it against several baselines with up to 1000 agents in various maps. Given a runtime limit, our method can handle an order of magnitude more agents than the baselines with about 25% longer makespan.

* AAAI2025

Via

Access Paper or Ask Questions

Robot Crowd Navigation in Dynamic Environment with Offline Reinforcement Learning

Dec 18, 2023

Shuai Zhou, Hao Fu, Haodong He, Wei Liu

Abstract:Robot crowd navigation has been gaining increasing attention and popularity in various practical applications. In existing research, deep reinforcement learning has been applied to robot crowd navigation by training policies in an online mode. However, this inevitably leads to unsafe exploration, and consequently causes low sampling efficiency during pedestrian-robot interaction. To this end, we propose an offline reinforcement learning based robot crowd navigation algorithm by utilizing pre-collected crowd navigation experience. Specifically, this algorithm integrates a spatial-temporal state into implicit Q-Learning to avoid querying out-of-distribution robot actions of the pre-collected experience, while capturing spatial-temporal features from the offline pedestrian-robot interactions. Experimental results demonstrate that the proposed algorithm outperforms the state-of-the-art methods by means of qualitative and quantitative analysis.

Via

Access Paper or Ask Questions

Boosting Model Inversion Attacks with Adversarial Examples

Jun 24, 2023

Shuai Zhou, Tianqing Zhu, Dayong Ye, Xin Yu, Wanlei Zhou

Figure 1 for Boosting Model Inversion Attacks with Adversarial Examples

Figure 2 for Boosting Model Inversion Attacks with Adversarial Examples

Figure 3 for Boosting Model Inversion Attacks with Adversarial Examples

Figure 4 for Boosting Model Inversion Attacks with Adversarial Examples

Abstract:Model inversion attacks involve reconstructing the training data of a target model, which raises serious privacy concerns for machine learning models. However, these attacks, especially learning-based methods, are likely to suffer from low attack accuracy, i.e., low classification accuracy of these reconstructed data by machine learning classifiers. Recent studies showed an alternative strategy of model inversion attacks, GAN-based optimization, can improve the attack accuracy effectively. However, these series of GAN-based attacks reconstruct only class-representative training data for a class, whereas learning-based attacks can reconstruct diverse data for different training data in each class. Hence, in this paper, we propose a new training paradigm for a learning-based model inversion attack that can achieve higher attack accuracy in a black-box setting. First, we regularize the training process of the attack model with an added semantic loss function and, second, we inject adversarial examples into the training data to increase the diversity of the class-related parts (i.e., the essential features for classification tasks) in training data. This scheme guides the attack model to pay more attention to the class-related parts of the original data during the data reconstruction process. The experimental results show that our method greatly boosts the performance of existing learning-based model inversion attacks. Even when no extra queries to the target model are allowed, the approach can still improve the attack accuracy of reconstructed data. This new attack shows that the severity of the threat from learning-based model inversion adversaries is underestimated and more robust defenses are required.

* 18 pages, 13 figures

Via

Access Paper or Ask Questions

Spatio-Temporal Transformer-Based Reinforcement Learning for Robot Crowd Navigation

May 26, 2023

Haodong He, Hao Fu, Qiang Wang, Shuai Zhou, Wei Liu

Figure 1 for Spatio-Temporal Transformer-Based Reinforcement Learning for Robot Crowd Navigation

Figure 2 for Spatio-Temporal Transformer-Based Reinforcement Learning for Robot Crowd Navigation

Figure 3 for Spatio-Temporal Transformer-Based Reinforcement Learning for Robot Crowd Navigation

Figure 4 for Spatio-Temporal Transformer-Based Reinforcement Learning for Robot Crowd Navigation

Abstract:The social robot navigation is an open and challenging problem. In existing work, separate modules are used to capture spatial and temporal features, respectively. However, such methods lead to extra difficulties in improving the utilization of spatio-temporal features and reducing the conservative nature of navigation policy. In light of this, we present a spatio-temporal transformer-based policy optimization algorithm to enhance the utilization of spatio-temporal features, thereby facilitating the capture of human-robot interactions. Specifically, this paper introduces a gated embedding mechanism that effectively aligns the spatial and temporal representations by integrating both modalities at the feature level. Then Transformer is leveraged to encode the spatio-temporal semantic information, with hope of finding the optimal navigation policy. Finally, a combination of spatio-temporal Transformer and self-adjusting policy entropy significantly reduces the conservatism of navigation policies. Experimental results demonstrate the effectiveness of the proposed framework, where our method shows superior performance.

Via

Access Paper or Ask Questions

Model Inversion Attack against Transfer Learning: Inverting a Model without Accessing It

Mar 13, 2022

Dayong Ye, Huiqiang Chen, Shuai Zhou, Tianqing Zhu, Wanlei Zhou, Shouling Ji

Figure 1 for Model Inversion Attack against Transfer Learning: Inverting a Model without Accessing It

Figure 2 for Model Inversion Attack against Transfer Learning: Inverting a Model without Accessing It

Figure 3 for Model Inversion Attack against Transfer Learning: Inverting a Model without Accessing It

Figure 4 for Model Inversion Attack against Transfer Learning: Inverting a Model without Accessing It

Abstract:Transfer learning is an important approach that produces pre-trained teacher models which can be used to quickly build specialized student models. However, recent research on transfer learning has found that it is vulnerable to various attacks, e.g., misclassification and backdoor attacks. However, it is still not clear whether transfer learning is vulnerable to model inversion attacks. Launching a model inversion attack against transfer learning scheme is challenging. Not only does the student model hide its structural parameters, but it is also inaccessible to the adversary. Hence, when targeting a student model, both the white-box and black-box versions of existing model inversion attacks fail. White-box attacks fail as they need the target model's parameters. Black-box attacks fail as they depend on making repeated queries of the target model. However, they may not mean that transfer learning models are impervious to model inversion attacks. Hence, with this paper, we initiate research into model inversion attacks against transfer learning schemes with two novel attack methods. Both are black-box attacks, suiting different situations, that do not rely on queries to the target student model. In the first method, the adversary has the data samples that share the same distribution as the training set of the teacher model. In the second method, the adversary does not have any such samples. Experiments show that highly recognizable data records can be recovered with both of these methods. This means that even if a model is an inaccessible black-box, it can still be inverted.

Via

Access Paper or Ask Questions

Label-only Model Inversion Attack: The Attack that Requires the Least Information

Mar 13, 2022

Dayong Ye, Tianqing Zhu, Shuai Zhou, Bo Liu, Wanlei Zhou

Figure 1 for Label-only Model Inversion Attack: The Attack that Requires the Least Information

Figure 2 for Label-only Model Inversion Attack: The Attack that Requires the Least Information

Figure 3 for Label-only Model Inversion Attack: The Attack that Requires the Least Information

Figure 4 for Label-only Model Inversion Attack: The Attack that Requires the Least Information

Abstract:In a model inversion attack, an adversary attempts to reconstruct the data records, used to train a target model, using only the model's output. In launching a contemporary model inversion attack, the strategies discussed are generally based on either predicted confidence score vectors, i.e., black-box attacks, or the parameters of a target model, i.e., white-box attacks. However, in the real world, model owners usually only give out the predicted labels; the confidence score vectors and model parameters are hidden as a defense mechanism to prevent such attacks. Unfortunately, we have found a model inversion method that can reconstruct the input data records based only on the output labels. We believe this is the attack that requires the least information to succeed and, therefore, has the best applicability. The key idea is to exploit the error rate of the target model to compute the median distance from a set of data records to the decision boundary of the target model. The distance, then, is used to generate confidence score vectors which are adopted to train an attack model to reconstruct the data records. The experimental results show that highly recognizable data records can be reconstructed with far less information than existing methods.

Via

Access Paper or Ask Questions

Fed-EINI: An Efficient and Interpretable Inference Framework for Decision Tree Ensembles in Federated Learning

May 20, 2021

Xiaolin Chen, Shuai Zhou, Kai Yang, Hao Fan, Zejin Feng, Zhong Chen, Hu Wang, Yongji Wang

Figure 1 for Fed-EINI: An Efficient and Interpretable Inference Framework for Decision Tree Ensembles in Federated Learning

Figure 2 for Fed-EINI: An Efficient and Interpretable Inference Framework for Decision Tree Ensembles in Federated Learning

Figure 3 for Fed-EINI: An Efficient and Interpretable Inference Framework for Decision Tree Ensembles in Federated Learning

Figure 4 for Fed-EINI: An Efficient and Interpretable Inference Framework for Decision Tree Ensembles in Federated Learning

Abstract:The increasing concerns about data privacy and security drives the emergence of a new field of studying privacy-preserving machine learning from isolated data sources, i.e., \textit{federated learning}. Vertical federated learning, where different parties hold different features for common users, has a great potential of driving a more variety of business cooperation among enterprises in different fields. Decision tree models especially decision tree ensembles are a class of widely applied powerful machine learning models with high interpretability and modeling efficiency. However, the interpretability are compromised in these works such as SecureBoost since the feature names are not exposed to avoid possible data breaches due to the unprotected decision path. In this paper, we shall propose Fed-EINI, an efficient and interpretable inference framework for federated decision tree models with only one round of multi-party communication. We shall compute the candidate sets of leaf nodes based on the local data at each party in parallel, followed by securely computing the weight of the only leaf node in the intersection of the candidate sets. We propose to protect the decision path by the efficient additively homomorphic encryption method, which allows the disclosure of feature names and thus makes the federated decision trees interpretable. The advantages of Fed-EINI will be demonstrated through theoretical analysis and extensive numerical results. Experiments show that the inference efficiency is improved by over $50\%$ in average.

* 9 pages, 10 figures

Via

Access Paper or Ask Questions

Multi-Modality Fusion based on Consensus-Voting and 3D Convolution for Isolated Gesture Recognition

Nov 28, 2016

Jiali Duan, Shuai Zhou, Jun Wan, Xiaoyuan Guo, Stan Z. Li

Figure 1 for Multi-Modality Fusion based on Consensus-Voting and 3D Convolution for Isolated Gesture Recognition

Figure 2 for Multi-Modality Fusion based on Consensus-Voting and 3D Convolution for Isolated Gesture Recognition

Figure 3 for Multi-Modality Fusion based on Consensus-Voting and 3D Convolution for Isolated Gesture Recognition

Figure 4 for Multi-Modality Fusion based on Consensus-Voting and 3D Convolution for Isolated Gesture Recognition

Abstract:Recently, the popularity of depth-sensors such as Kinect has made depth videos easily available while its advantages have not been fully exploited. This paper investigates, for gesture recognition, to explore the spatial and temporal information complementarily embedded in RGB and depth sequences. We propose a convolutional twostream consensus voting network (2SCVN) which explicitly models both the short-term and long-term structure of the RGB sequences. To alleviate distractions from background, a 3d depth-saliency ConvNet stream (3DDSN) is aggregated in parallel to identify subtle motion characteristics. These two components in an unified framework significantly improve the recognition accuracy. On the challenging Chalearn IsoGD benchmark, our proposed method outperforms the first place on the leader-board by a large margin (10.29%) while also achieving the best result on RGBD-HuDaAct dataset (96.74%). Both quantitative experiments and qualitative analysis shows the effectiveness of our proposed framework and codes will be released to facilitate future research.

Via

Access Paper or Ask Questions