Abstract:The emerging programmable networks sparked significant research on Intelligent Network Data Plane (INDP), which achieves learning-based traffic analysis at line-speed. Prior art in INDP focus on deploying tree/forest models on the data plane. We observe a fundamental limitation in tree-based INDP approaches: although it is possible to represent even larger tree/forest tables on the data plane, the flow features that are computable on the data plane are fundamentally limited by hardware constraints. In this paper, we present BoS to push the boundaries of INDP by enabling Neural Network (NN) driven traffic analysis at line-speed. Many types of NNs (such as Recurrent Neural Network (RNN), and transformers) that are designed to work with sequential data have advantages over tree-based models, because they can take raw network data as input without complex feature computations on the fly. However, the challenge is significant: the recurrent computation scheme used in RNN inference is fundamentally different from the match-action paradigm used on the network data plane. BoS addresses this challenge by (i) designing a novel data plane friendly RNN architecture that can execute unlimited RNN time steps with limited data plane stages, effectively achieving line-speed RNN inference; and (ii) complementing the on-switch RNN model with an off-switch transformer-based traffic analysis module to further boost the overall performance. We implement a prototype of BoS using a P4 programmable switch as our data plane, and extensively evaluate it over multiple traffic analysis tasks. The results show that BoS outperforms state-of-the-art in both analysis accuracy and scalability.
Abstract:The escalating focus on data privacy poses significant challenges for collaborative neural network training, where data ownership and model training/deployment responsibilities reside with distinct entities. Our community has made substantial contributions to addressing this challenge, proposing various approaches such as federated learning (FL) and privacy-preserving machine learning based on cryptographic constructs like homomorphic encryption (HE) and secure multiparty computation (MPC). However, FL completely overlooks model privacy, and HE has limited extensibility (confined to only one data provider). While the state-of-the-art MPC frameworks provide reasonable throughput and simultaneously ensure model/data privacy, they rely on a critical non-colluding assumption on the computing servers, and relaxing this assumption is still an open problem. In this paper, we present Pencil, the first private training framework for collaborative learning that simultaneously offers data privacy, model privacy, and extensibility to multiple data providers, without relying on the non-colluding assumption. Our fundamental design principle is to construct the n-party collaborative training protocol based on an efficient two-party protocol, and meanwhile ensuring that switching to different data providers during model training introduces no extra cost. We introduce several novel cryptographic protocols to realize this design principle and conduct a rigorous security and privacy analysis. Our comprehensive evaluations of Pencil demonstrate that (i) models trained in plaintext and models trained privately using Pencil exhibit nearly identical test accuracies; (ii) The training overhead of Pencil is greatly reduced: Pencil achieves 10 ~ 260x higher throughput and 2 orders of magnitude less communication than prior art; (iii) Pencil is resilient against both existing and adaptive (white-box) attacks.
Abstract:Federated Learning (FL) typically aggregates client model parameters using a weighting approach determined by sample proportions. However, this naive weighting method may lead to unfairness and degradation in model performance due to statistical heterogeneity and the inclusion of noisy data among clients. Theoretically, distributional robustness analysis has shown that the generalization performance of a learning model with respect to any shifted distribution is bounded. This motivates us to reconsider the weighting approach in federated learning. In this paper, we replace the aforementioned weighting method with a new strategy that considers the generalization bounds of each local model. Specifically, we estimate the upper and lower bounds of the second-order origin moment of the shifted distribution for the current local model, and then use these bounds disagreements as the aggregation proportions for weightings in each communication round. Experiments demonstrate that the proposed weighting strategy significantly improves the performance of several representative FL algorithms on benchmark datasets.
Abstract:Robust network design, which aims to guarantee network availability under various failure scenarios while optimizing performance/cost objectives, has received significant attention. Existing approaches often rely on model-based mixed-integer optimization that is hard to scale or employ deep learning to solve specific engineering problems yet with limited generalizability. In this paper, we show that failure evaluation provides a common kernel to improve the tractability and scalability of existing solutions. By providing a neural network function approximation of this common kernel using graph attention networks, we develop a unified learning-based framework, FERN, for scalable Failure Evaluation and Robust Network design. FERN represents rich problem inputs as a graph and captures both local and global views by attentively performing feature extraction from the graph. It enables a broad range of robust network design problems, including robust network validation, network upgrade optimization, and fault-tolerant traffic engineering that are discussed in this paper, to be recasted with respect to the common kernel and thus computed efficiently using neural networks and over a small set of critical failure scenarios. Extensive experiments on real-world network topologies show that FERN can efficiently and accurately identify key failure scenarios for both OSPF and optimal routing scheme, and generalizes well to different topologies and input traffic patterns. It can speed up multiple robust network design problems by more than 80x, 200x, 10x, respectively with negligible performance gap.
Abstract:Graph Neural Networks (GNNs) have achieved state-of-the-art performance in various graph structure related tasks such as node classification and graph classification. However, GNNs are vulnerable to adversarial attacks. Existing works mainly focus on attacking GNNs for node classification; nevertheless, the attacks against GNNs for graph classification have not been well explored. In this work, we conduct a systematic study on adversarial attacks against GNNs for graph classification via perturbing the graph structure. In particular, we focus on the most challenging attack, i.e., hard label black-box attack, where an attacker has no knowledge about the target GNN model and can only obtain predicted labels through querying the target model.To achieve this goal, we formulate our attack as an optimization problem, whose objective is to minimize the number of edges to be perturbed in a graph while maintaining the high attack success rate. The original optimization problem is intractable to solve, and we relax the optimization problem to be a tractable one, which is solved with theoretical convergence guarantee. We also design a coarse-grained searching algorithm and a query-efficient gradient computation algorithm to decrease the number of queries to the target GNN model. Our experimental results on three real-world datasets demonstrate that our attack can effectively attack representative GNNs for graph classification with less queries and perturbations. We also evaluate the effectiveness of our attack under two defenses: one is well-designed adversarial graph detector and the other is that the target GNN model itself is equipped with a defense to prevent adversarial graph generation. Our experimental results show that such defenses are not effective enough, which highlights more advanced defenses.
Abstract:Robot arm placements are oftentimes a limitation in surgical preoperative procedures, relying on trained staff to evaluate and decide on the optimal positions for the arms. Given new and different patient anatomies, it can be challenging to make an informed choice, leading to more frequently colliding arms or limited manipulator workspaces. In this paper, we develop a method to generate the optimal manipulator base positions for the multi-port da Vinci surgical system that minimizes self-collision and environment-collision, and maximizes the surgeon's reachability inside the patient. Scoring functions are defined for each criterion so that they may be optimized over. Since for multi-manipulator setups, a large number of free parameters are available to adjust the base positioning of each arm, a challenge becomes how one can expediently assess possible setups. We thus also propose methods that perform fast queries of each measure with the use of a proxy collision-checker. We then develop an optimization method to determine the optimal position using the scoring functions. We evaluate the optimality of the base positions for the robot arms on canonical trajectories, and show that the solution yielded by the optimization program can satisfy each criterion. The metrics and optimization strategy are generalizable to other surgical robotic platforms so that patient-side manipulator positioning may be optimized and solved.
Abstract:Recommender systems play a crucial role in helping users to find their interested information in various web services such as Amazon, YouTube, and Google News. Various recommender systems, ranging from neighborhood-based, association-rule-based, matrix-factorization-based, to deep learning based, have been developed and deployed in industry. Among them, deep learning based recommender systems become increasingly popular due to their superior performance. In this work, we conduct the first systematic study on data poisoning attacks to deep learning based recommender systems. An attacker's goal is to manipulate a recommender system such that the attacker-chosen target items are recommended to many users. To achieve this goal, our attack injects fake users with carefully crafted ratings to a recommender system. Specifically, we formulate our attack as an optimization problem, such that the injected ratings would maximize the number of normal users to whom the target items are recommended. However, it is challenging to solve the optimization problem because it is a non-convex integer programming problem. To address the challenge, we develop multiple techniques to approximately solve the optimization problem. Our experimental results on three real-world datasets, including small and large datasets, show that our attack is effective and outperforms existing attacks. Moreover, we attempt to detect fake users via statistical analysis of the rating patterns of normal and fake users. Our results show that our attack is still effective and outperforms existing attacks even if such a detector is deployed.
Abstract:While deep learning (DL)-based networked systems have shown great potential in various applications, a key drawback is that Deep Neural Networks (DNNs) in DL are blackboxes and nontransparent for network operators. The lack of interpretability makes DL-based networked systems challenging to operate and troubleshoot, which further prevents DL-based networked systems from deploying in practice. In this paper, we propose TranSys, a novel framework to explain DL-based networked systems for practical deployment. Transys categorizes current DL-based networked systems and introduces different explanation methods based on decision tree and hypergraph to effectively explain DL-based networked systems. TranSys can explain the DNN policies in the form of decision trees and highlight critical components based on analysis over hypergraph. We evaluate TranSys over several typical DL-based networked systems and demonstrate that Transys can provide human-readable explanations for network operators. We also present three use cases of Transys, which could (i) help network operators troubleshoot DL-based networked systems, (ii) improve the decision latency and resource consumption of DL-based networked systems by ~10x on different metrics, and (iii) provide suggestions on daily operations for network operators when incidences occur.