Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Mingwei Xu

Homogeneous Keys, Heterogeneous Values: Exploiting Local KV Cache Asymmetry for Long-Context LLMs

Jun 04, 2025

Wanyun Cui, Mingwei Xu

Abstract:Recent advances in Large Language Models (LLMs) have highlighted the critical importance of extending context length, yet the quadratic complexity of attention mechanisms poses significant challenges for efficient long-context modeling. KV cache compression has emerged as a key approach to address this challenge. Through extensive empirical analysis, we reveal a fundamental yet previously overlooked asymmetry in KV caches: while adjacent keys receive similar attention weights (local homogeneity), adjacent values demonstrate distinct heterogeneous distributions. This key-value asymmetry reveals a critical limitation in existing compression methods that treat keys and values uniformly. To address the limitation, we propose a training-free compression framework (AsymKV) that combines homogeneity-based key merging with a mathematically proven lossless value compression. Extensive experiments demonstrate that AsymKV consistently outperforms existing long-context methods across various tasks and base models. For example, on LLaMA3.1-8B, AsymKV achieves an average score of 43.95 on LongBench, surpassing SOTA methods like H$_2$O (38.89) by a large margin.

* 14 pages,7 figures

Via

Access Paper or Ask Questions

Brain-on-Switch: Towards Advanced Intelligent Network Data Plane via NN-Driven Traffic Analysis at Line-Speed

Mar 17, 2024

Jinzhu Yan, Haotian Xu, Zhuotao Liu, Qi Li, Ke Xu, Mingwei Xu, Jianping Wu

Figure 1 for Brain-on-Switch: Towards Advanced Intelligent Network Data Plane via NN-Driven Traffic Analysis at Line-Speed

Figure 2 for Brain-on-Switch: Towards Advanced Intelligent Network Data Plane via NN-Driven Traffic Analysis at Line-Speed

Figure 3 for Brain-on-Switch: Towards Advanced Intelligent Network Data Plane via NN-Driven Traffic Analysis at Line-Speed

Figure 4 for Brain-on-Switch: Towards Advanced Intelligent Network Data Plane via NN-Driven Traffic Analysis at Line-Speed

Abstract:The emerging programmable networks sparked significant research on Intelligent Network Data Plane (INDP), which achieves learning-based traffic analysis at line-speed. Prior art in INDP focus on deploying tree/forest models on the data plane. We observe a fundamental limitation in tree-based INDP approaches: although it is possible to represent even larger tree/forest tables on the data plane, the flow features that are computable on the data plane are fundamentally limited by hardware constraints. In this paper, we present BoS to push the boundaries of INDP by enabling Neural Network (NN) driven traffic analysis at line-speed. Many types of NNs (such as Recurrent Neural Network (RNN), and transformers) that are designed to work with sequential data have advantages over tree-based models, because they can take raw network data as input without complex feature computations on the fly. However, the challenge is significant: the recurrent computation scheme used in RNN inference is fundamentally different from the match-action paradigm used on the network data plane. BoS addresses this challenge by (i) designing a novel data plane friendly RNN architecture that can execute unlimited RNN time steps with limited data plane stages, effectively achieving line-speed RNN inference; and (ii) complementing the on-switch RNN model with an off-switch transformer-based traffic analysis module to further boost the overall performance. We implement a prototype of BoS using a P4 programmable switch as our data plane, and extensively evaluate it over multiple traffic analysis tasks. The results show that BoS outperforms state-of-the-art in both analysis accuracy and scalability.

* 12 pages body, 22 pages total, 14 figures, accepted by the 21st USENIX Symposium on Networked Systems Design and Implementation (NSDI'24)

Via

Access Paper or Ask Questions

Pencil: Private and Extensible Collaborative Learning without the Non-Colluding Assumption

Mar 17, 2024

Xuanqi Liu, Zhuotao Liu, Qi Li, Ke Xu, Mingwei Xu

Abstract:The escalating focus on data privacy poses significant challenges for collaborative neural network training, where data ownership and model training/deployment responsibilities reside with distinct entities. Our community has made substantial contributions to addressing this challenge, proposing various approaches such as federated learning (FL) and privacy-preserving machine learning based on cryptographic constructs like homomorphic encryption (HE) and secure multiparty computation (MPC). However, FL completely overlooks model privacy, and HE has limited extensibility (confined to only one data provider). While the state-of-the-art MPC frameworks provide reasonable throughput and simultaneously ensure model/data privacy, they rely on a critical non-colluding assumption on the computing servers, and relaxing this assumption is still an open problem. In this paper, we present Pencil, the first private training framework for collaborative learning that simultaneously offers data privacy, model privacy, and extensibility to multiple data providers, without relying on the non-colluding assumption. Our fundamental design principle is to construct the n-party collaborative training protocol based on an efficient two-party protocol, and meanwhile ensuring that switching to different data providers during model training introduces no extra cost. We introduce several novel cryptographic protocols to realize this design principle and conduct a rigorous security and privacy analysis. Our comprehensive evaluations of Pencil demonstrate that (i) models trained in plaintext and models trained privately using Pencil exhibit nearly identical test accuracies; (ii) The training overhead of Pencil is greatly reduced: Pencil achieves 10 ~ 260x higher throughput and 2 orders of magnitude less communication than prior art; (iii) Pencil is resilient against both existing and adaptive (white-box) attacks.

* Proceedings 2024 Network and Distributed System Security Symposium (2024)
* Network and Distributed System Security Symposium (NDSS) 2024

Via

Access Paper or Ask Questions

Aggregation Weighting of Federated Learning via Generalization Bound Estimation

Nov 10, 2023

Mingwei Xu, Xiaofeng Cao, Ivor W. Tsang, James T. Kwok

Abstract:Federated Learning (FL) typically aggregates client model parameters using a weighting approach determined by sample proportions. However, this naive weighting method may lead to unfairness and degradation in model performance due to statistical heterogeneity and the inclusion of noisy data among clients. Theoretically, distributional robustness analysis has shown that the generalization performance of a learning model with respect to any shifted distribution is bounded. This motivates us to reconsider the weighting approach in federated learning. In this paper, we replace the aforementioned weighting method with a new strategy that considers the generalization bounds of each local model. Specifically, we estimate the upper and lower bounds of the second-order origin moment of the shifted distribution for the current local model, and then use these bounds disagreements as the aggregation proportions for weightings in each communication round. Experiments demonstrate that the proposed weighting strategy significantly improves the performance of several representative FL algorithms on benchmark datasets.

Via

Access Paper or Ask Questions

FERN: Leveraging Graph Attention Networks for Failure Evaluation and Robust Network Design

May 30, 2023

Chenyi Liu, Vaneet Aggarwal, Tian Lan, Nan Geng, Yuan Yang, Mingwei Xu, Qing Li

Figure 1 for FERN: Leveraging Graph Attention Networks for Failure Evaluation and Robust Network Design

Figure 2 for FERN: Leveraging Graph Attention Networks for Failure Evaluation and Robust Network Design

Figure 3 for FERN: Leveraging Graph Attention Networks for Failure Evaluation and Robust Network Design

Figure 4 for FERN: Leveraging Graph Attention Networks for Failure Evaluation and Robust Network Design

Abstract:Robust network design, which aims to guarantee network availability under various failure scenarios while optimizing performance/cost objectives, has received significant attention. Existing approaches often rely on model-based mixed-integer optimization that is hard to scale or employ deep learning to solve specific engineering problems yet with limited generalizability. In this paper, we show that failure evaluation provides a common kernel to improve the tractability and scalability of existing solutions. By providing a neural network function approximation of this common kernel using graph attention networks, we develop a unified learning-based framework, FERN, for scalable Failure Evaluation and Robust Network design. FERN represents rich problem inputs as a graph and captures both local and global views by attentively performing feature extraction from the graph. It enables a broad range of robust network design problems, including robust network validation, network upgrade optimization, and fault-tolerant traffic engineering that are discussed in this paper, to be recasted with respect to the common kernel and thus computed efficiently using neural networks and over a small set of critical failure scenarios. Extensive experiments on real-world network topologies show that FERN can efficiently and accurately identify key failure scenarios for both OSPF and optimal routing scheme, and generalizes well to different topologies and input traffic patterns. It can speed up multiple robust network design problems by more than 80x, 200x, 10x, respectively with negligible performance gap.

Via

Access Paper or Ask Questions

A Hard Label Black-box Adversarial Attack Against Graph Neural Networks

Aug 21, 2021

Jiaming Mu, Binghui Wang, Qi Li, Kun Sun, Mingwei Xu, Zhuotao Liu

Figure 1 for A Hard Label Black-box Adversarial Attack Against Graph Neural Networks

Figure 2 for A Hard Label Black-box Adversarial Attack Against Graph Neural Networks

Figure 3 for A Hard Label Black-box Adversarial Attack Against Graph Neural Networks

Figure 4 for A Hard Label Black-box Adversarial Attack Against Graph Neural Networks

Abstract:Graph Neural Networks (GNNs) have achieved state-of-the-art performance in various graph structure related tasks such as node classification and graph classification. However, GNNs are vulnerable to adversarial attacks. Existing works mainly focus on attacking GNNs for node classification; nevertheless, the attacks against GNNs for graph classification have not been well explored. In this work, we conduct a systematic study on adversarial attacks against GNNs for graph classification via perturbing the graph structure. In particular, we focus on the most challenging attack, i.e., hard label black-box attack, where an attacker has no knowledge about the target GNN model and can only obtain predicted labels through querying the target model.To achieve this goal, we formulate our attack as an optimization problem, whose objective is to minimize the number of edges to be perturbed in a graph while maintaining the high attack success rate. The original optimization problem is intractable to solve, and we relax the optimization problem to be a tractable one, which is solved with theoretical convergence guarantee. We also design a coarse-grained searching algorithm and a query-efficient gradient computation algorithm to decrease the number of queries to the target GNN model. Our experimental results on three real-world datasets demonstrate that our attack can effectively attack representative GNNs for graph classification with less queries and perturbations. We also evaluate the effectiveness of our attack under two defenses: one is well-designed adversarial graph detector and the other is that the target GNN model itself is equipped with a defense to prevent adversarial graph generation. Our experimental results show that such defenses are not effective enough, which highlights more advanced defenses.

Via

Access Paper or Ask Questions

Optimal Multi-Manipulator Arm Placement for Maximal Dexterity during Robotics Surgery

Apr 13, 2021

James Di, Mingwei Xu, Nikhil Das, Michael C. Yip

Figure 1 for Optimal Multi-Manipulator Arm Placement for Maximal Dexterity during Robotics Surgery

Figure 2 for Optimal Multi-Manipulator Arm Placement for Maximal Dexterity during Robotics Surgery

Figure 3 for Optimal Multi-Manipulator Arm Placement for Maximal Dexterity during Robotics Surgery

Figure 4 for Optimal Multi-Manipulator Arm Placement for Maximal Dexterity during Robotics Surgery

Abstract:Robot arm placements are oftentimes a limitation in surgical preoperative procedures, relying on trained staff to evaluate and decide on the optimal positions for the arms. Given new and different patient anatomies, it can be challenging to make an informed choice, leading to more frequently colliding arms or limited manipulator workspaces. In this paper, we develop a method to generate the optimal manipulator base positions for the multi-port da Vinci surgical system that minimizes self-collision and environment-collision, and maximizes the surgeon's reachability inside the patient. Scoring functions are defined for each criterion so that they may be optimized over. Since for multi-manipulator setups, a large number of free parameters are available to adjust the base positioning of each arm, a challenge becomes how one can expediently assess possible setups. We thus also propose methods that perform fast queries of each measure with the use of a proxy collision-checker. We then develop an optimization method to determine the optimal position using the scoring functions. We evaluate the optimality of the base positions for the robot arms on canonical trajectories, and show that the solution yielded by the optimization program can satisfy each criterion. The metrics and optimization strategy are generalizable to other surgical robotic platforms so that patient-side manipulator positioning may be optimized and solved.

Via

Access Paper or Ask Questions

Data Poisoning Attacks to Deep Learning Based Recommender Systems

Jan 08, 2021

Hai Huang, Jiaming Mu, Neil Zhenqiang Gong, Qi Li, Bin Liu, Mingwei Xu

Figure 1 for Data Poisoning Attacks to Deep Learning Based Recommender Systems

Figure 2 for Data Poisoning Attacks to Deep Learning Based Recommender Systems

Figure 3 for Data Poisoning Attacks to Deep Learning Based Recommender Systems

Figure 4 for Data Poisoning Attacks to Deep Learning Based Recommender Systems

Abstract:Recommender systems play a crucial role in helping users to find their interested information in various web services such as Amazon, YouTube, and Google News. Various recommender systems, ranging from neighborhood-based, association-rule-based, matrix-factorization-based, to deep learning based, have been developed and deployed in industry. Among them, deep learning based recommender systems become increasingly popular due to their superior performance. In this work, we conduct the first systematic study on data poisoning attacks to deep learning based recommender systems. An attacker's goal is to manipulate a recommender system such that the attacker-chosen target items are recommended to many users. To achieve this goal, our attack injects fake users with carefully crafted ratings to a recommender system. Specifically, we formulate our attack as an optimization problem, such that the injected ratings would maximize the number of normal users to whom the target items are recommended. However, it is challenging to solve the optimization problem because it is a non-convex integer programming problem. To address the challenge, we develop multiple techniques to approximately solve the optimization problem. Our experimental results on three real-world datasets, including small and large datasets, show that our attack is effective and outperforms existing attacks. Moreover, we attempt to detect fake users via statistical analysis of the rating patterns of normal and fake users. Our results show that our attack is still effective and outperforms existing attacks even if such a detector is deployed.

* To appear in NDSS 2021

Via

Access Paper or Ask Questions

Explaining Deep Learning-Based Networked Systems

Oct 09, 2019

Zili Meng, Minhu Wang, Mingwei Xu, Hongzi Mao, Jiasong Bai, Hongxin Hu

Figure 1 for Explaining Deep Learning-Based Networked Systems

Figure 2 for Explaining Deep Learning-Based Networked Systems

Figure 3 for Explaining Deep Learning-Based Networked Systems

Figure 4 for Explaining Deep Learning-Based Networked Systems

Abstract:While deep learning (DL)-based networked systems have shown great potential in various applications, a key drawback is that Deep Neural Networks (DNNs) in DL are blackboxes and nontransparent for network operators. The lack of interpretability makes DL-based networked systems challenging to operate and troubleshoot, which further prevents DL-based networked systems from deploying in practice. In this paper, we propose TranSys, a novel framework to explain DL-based networked systems for practical deployment. Transys categorizes current DL-based networked systems and introduces different explanation methods based on decision tree and hypergraph to effectively explain DL-based networked systems. TranSys can explain the DNN policies in the form of decision trees and highlight critical components based on analysis over hypergraph. We evaluate TranSys over several typical DL-based networked systems and demonstrate that Transys can provide human-readable explanations for network operators. We also present three use cases of Transys, which could (i) help network operators troubleshoot DL-based networked systems, (ii) improve the decision latency and resource consumption of DL-based networked systems by ~10x on different metrics, and (iii) provide suggestions on daily operations for network operators when incidences occur.

* 21 pages, 25 figures

Via

Access Paper or Ask Questions