Abstract:Robot crowd navigation has been gaining increasing attention and popularity in various practical applications. In existing research, deep reinforcement learning has been applied to robot crowd navigation by training policies in an online mode. However, this inevitably leads to unsafe exploration, and consequently causes low sampling efficiency during pedestrian-robot interaction. To this end, we propose an offline reinforcement learning based robot crowd navigation algorithm by utilizing pre-collected crowd navigation experience. Specifically, this algorithm integrates a spatial-temporal state into implicit Q-Learning to avoid querying out-of-distribution robot actions of the pre-collected experience, while capturing spatial-temporal features from the offline pedestrian-robot interactions. Experimental results demonstrate that the proposed algorithm outperforms the state-of-the-art methods by means of qualitative and quantitative analysis.
Abstract:Model inversion attacks involve reconstructing the training data of a target model, which raises serious privacy concerns for machine learning models. However, these attacks, especially learning-based methods, are likely to suffer from low attack accuracy, i.e., low classification accuracy of these reconstructed data by machine learning classifiers. Recent studies showed an alternative strategy of model inversion attacks, GAN-based optimization, can improve the attack accuracy effectively. However, these series of GAN-based attacks reconstruct only class-representative training data for a class, whereas learning-based attacks can reconstruct diverse data for different training data in each class. Hence, in this paper, we propose a new training paradigm for a learning-based model inversion attack that can achieve higher attack accuracy in a black-box setting. First, we regularize the training process of the attack model with an added semantic loss function and, second, we inject adversarial examples into the training data to increase the diversity of the class-related parts (i.e., the essential features for classification tasks) in training data. This scheme guides the attack model to pay more attention to the class-related parts of the original data during the data reconstruction process. The experimental results show that our method greatly boosts the performance of existing learning-based model inversion attacks. Even when no extra queries to the target model are allowed, the approach can still improve the attack accuracy of reconstructed data. This new attack shows that the severity of the threat from learning-based model inversion adversaries is underestimated and more robust defenses are required.
Abstract:The social robot navigation is an open and challenging problem. In existing work, separate modules are used to capture spatial and temporal features, respectively. However, such methods lead to extra difficulties in improving the utilization of spatio-temporal features and reducing the conservative nature of navigation policy. In light of this, we present a spatio-temporal transformer-based policy optimization algorithm to enhance the utilization of spatio-temporal features, thereby facilitating the capture of human-robot interactions. Specifically, this paper introduces a gated embedding mechanism that effectively aligns the spatial and temporal representations by integrating both modalities at the feature level. Then Transformer is leveraged to encode the spatio-temporal semantic information, with hope of finding the optimal navigation policy. Finally, a combination of spatio-temporal Transformer and self-adjusting policy entropy significantly reduces the conservatism of navigation policies. Experimental results demonstrate the effectiveness of the proposed framework, where our method shows superior performance.
Abstract:Transfer learning is an important approach that produces pre-trained teacher models which can be used to quickly build specialized student models. However, recent research on transfer learning has found that it is vulnerable to various attacks, e.g., misclassification and backdoor attacks. However, it is still not clear whether transfer learning is vulnerable to model inversion attacks. Launching a model inversion attack against transfer learning scheme is challenging. Not only does the student model hide its structural parameters, but it is also inaccessible to the adversary. Hence, when targeting a student model, both the white-box and black-box versions of existing model inversion attacks fail. White-box attacks fail as they need the target model's parameters. Black-box attacks fail as they depend on making repeated queries of the target model. However, they may not mean that transfer learning models are impervious to model inversion attacks. Hence, with this paper, we initiate research into model inversion attacks against transfer learning schemes with two novel attack methods. Both are black-box attacks, suiting different situations, that do not rely on queries to the target student model. In the first method, the adversary has the data samples that share the same distribution as the training set of the teacher model. In the second method, the adversary does not have any such samples. Experiments show that highly recognizable data records can be recovered with both of these methods. This means that even if a model is an inaccessible black-box, it can still be inverted.
Abstract:In a model inversion attack, an adversary attempts to reconstruct the data records, used to train a target model, using only the model's output. In launching a contemporary model inversion attack, the strategies discussed are generally based on either predicted confidence score vectors, i.e., black-box attacks, or the parameters of a target model, i.e., white-box attacks. However, in the real world, model owners usually only give out the predicted labels; the confidence score vectors and model parameters are hidden as a defense mechanism to prevent such attacks. Unfortunately, we have found a model inversion method that can reconstruct the input data records based only on the output labels. We believe this is the attack that requires the least information to succeed and, therefore, has the best applicability. The key idea is to exploit the error rate of the target model to compute the median distance from a set of data records to the decision boundary of the target model. The distance, then, is used to generate confidence score vectors which are adopted to train an attack model to reconstruct the data records. The experimental results show that highly recognizable data records can be reconstructed with far less information than existing methods.
Abstract:The increasing concerns about data privacy and security drives the emergence of a new field of studying privacy-preserving machine learning from isolated data sources, i.e., \textit{federated learning}. Vertical federated learning, where different parties hold different features for common users, has a great potential of driving a more variety of business cooperation among enterprises in different fields. Decision tree models especially decision tree ensembles are a class of widely applied powerful machine learning models with high interpretability and modeling efficiency. However, the interpretability are compromised in these works such as SecureBoost since the feature names are not exposed to avoid possible data breaches due to the unprotected decision path. In this paper, we shall propose Fed-EINI, an efficient and interpretable inference framework for federated decision tree models with only one round of multi-party communication. We shall compute the candidate sets of leaf nodes based on the local data at each party in parallel, followed by securely computing the weight of the only leaf node in the intersection of the candidate sets. We propose to protect the decision path by the efficient additively homomorphic encryption method, which allows the disclosure of feature names and thus makes the federated decision trees interpretable. The advantages of Fed-EINI will be demonstrated through theoretical analysis and extensive numerical results. Experiments show that the inference efficiency is improved by over $50\%$ in average.
Abstract:Recently, the popularity of depth-sensors such as Kinect has made depth videos easily available while its advantages have not been fully exploited. This paper investigates, for gesture recognition, to explore the spatial and temporal information complementarily embedded in RGB and depth sequences. We propose a convolutional twostream consensus voting network (2SCVN) which explicitly models both the short-term and long-term structure of the RGB sequences. To alleviate distractions from background, a 3d depth-saliency ConvNet stream (3DDSN) is aggregated in parallel to identify subtle motion characteristics. These two components in an unified framework significantly improve the recognition accuracy. On the challenging Chalearn IsoGD benchmark, our proposed method outperforms the first place on the leader-board by a large margin (10.29%) while also achieving the best result on RGBD-HuDaAct dataset (96.74%). Both quantitative experiments and qualitative analysis shows the effectiveness of our proposed framework and codes will be released to facilitate future research.