Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Anwar Walid

Distilling Large Language Models for Network Active Queue Management

Jan 28, 2025

Deol Satish, Shiva Raj Pokhrel, Jonathan Kua, Anwar Walid

Abstract:The growing complexity of network traffic and demand for ultra-low latency communication require smarter packet traffic management. Existing Deep Learning-based queuing approaches struggle with dynamic network scenarios and demand high engineering effort. We propose AQM-LLM, distilling Large Language Models (LLMs) with few-shot learning, contextual understanding, and pattern recognition to improve Active Queue Management (AQM) [RFC 9330] with minimal manual effort. We consider a specific case where AQM is Low Latency, Low Loss, and Scalable Throughput (L4S) and our design of AQM-LLM builds on speculative decoding and reinforcement-based distilling of LLM by tackling congestion prevention in the L4S architecture using Explicit Congestion Notification (ECN) [RFC 9331] and periodic packet dropping. We develop a new open-source experimental platform by executing L4S-AQM on FreeBSD-14, providing interoperable modules to support LLM integration and facilitate IETF recognition through wider testing. Our extensive evaluations show L4S-LLM enhances queue management, prevents congestion, reduces latency, and boosts network performance, showcasing LLMs' adaptability and efficiency in uplifting AQM systems.

* IEEE Trans on Networking, 2025
* 11 pages

Via

Access Paper or Ask Questions

FinGPT-HPC: Efficient Pretraining and Finetuning Large Language Models for Financial Applications with High-Performance Computing

Feb 21, 2024

Xiao-Yang Liu, Jie Zhang, Guoxuan Wang, Weiqing Tong, Anwar Walid

Abstract:Large language models (LLMs) are computationally intensive. The computation workload and the memory footprint grow quadratically with the dimension (layer width). Most of LLMs' parameters come from the linear layers of the transformer structure and are highly redundant. These linear layers contribute more than 80% of the computation workload and 99% of the model size. To pretrain and finetune LLMs efficiently, there are three major challenges to address: 1) reducing redundancy of the linear layers; 2) reducing GPU memory footprint; 3) improving GPU utilization when using distributed training. Prior methods, such as LoRA and QLoRA, utilized low-rank matrices and quantization to reduce the number of trainable parameters and model size, respectively. However, the resulting model still consumes a large amount of GPU memory. In this paper, we present high-performance GPU-based methods that exploit low-rank structures to pretrain and finetune LLMs for financial applications. We replace one conventional linear layer of the transformer structure with two narrower linear layers, which allows us to reduce the number of parameters by several orders of magnitude. By quantizing the parameters into low precision (8-bit and 4-bit), the memory consumption of the resulting model is further reduced. Compared with existing LLMs, our methods achieve a speedup of 1.3X and a model compression ratio of 2.64X for pretaining without accuracy drop. For finetuning, our methods achieve an average accuracy increase of 6.3% and 24.0% in general tasks and financial tasks, respectively, and GPU memory consumption ratio of 6.3X. The sizes of our models are smaller than 0.59 GB, allowing inference on a smartphone.

Via

Access Paper or Ask Questions

Deep Reinforcement Learning for Traffic Light Control in Intelligent Transportation Systems

Feb 04, 2023

Xiao-Yang Liu, Ming Zhu, Sem Borst, Anwar Walid

Abstract:Smart traffic lights in intelligent transportation systems (ITSs) are envisioned to greatly increase traffic efficiency and reduce congestion. Deep reinforcement learning (DRL) is a promising approach to adaptively control traffic lights based on the real-time traffic situation in a road network. However, conventional methods may suffer from poor scalability. In this paper, we investigate deep reinforcement learning to control traffic lights, and both theoretical analysis and numerical experiments show that the intelligent behavior ``greenwave" (i.e., a vehicle will see a progressive cascade of green lights, and not have to brake at any intersection) emerges naturally a grid road network, which is proved to be the optimal policy in an avenue with multiple cross streets. As a first step, we use two DRL algorithms for the traffic light control problems in two scenarios. In a single road intersection, we verify that the deep Q-network (DQN) algorithm delivers a thresholding policy; and in a grid road network, we adopt the deep deterministic policy gradient (DDPG) algorithm. Secondly, numerical experiments show that the DQN algorithm delivers the optimal control, and the DDPG algorithm with passive observations has the capability to produce on its own a high-level intelligent behavior in a grid road network, namely, the ``greenwave" policy emerges. We also verify the ``greenwave" patterns in a $5 \times 10$ grid road network. Thirdly, the ``greenwave" patterns demonstrate that DRL algorithms produce favorable solutions since the ``greenwave" policy shown in experiment results is proved to be optimal in a specified traffic model (an avenue with multiple cross streets). The delivered policies both in a single road intersection and a grid road network demonstrate the scalability of DRL algorithms.

* IEEE Transactions on Network Science and Engineering, 2023
* 17 pages

Via

Access Paper or Ask Questions

Fair and Efficient Distributed Edge Learning with Hybrid Multipath TCP

Nov 03, 2022

Shiva Raj Pokhrel, Jinho Choi, Anwar Walid

Abstract:The bottleneck of distributed edge learning (DEL) over wireless has shifted from computing to communication, primarily the aggregation-averaging (Agg-Avg) process of DEL. The existing transmission control protocol (TCP)-based data networking schemes for DEL are application-agnostic and fail to deliver adjustments according to application layer requirements. As a result, they introduce massive excess time and undesired issues such as unfairness and stragglers. Other prior mitigation solutions have significant limitations as they balance data flow rates from workers across paths but often incur imbalanced backlogs when the paths exhibit variance, causing stragglers. To facilitate a more productive DEL, we develop a hybrid multipath TCP (MPTCP) by combining model-based and deep reinforcement learning (DRL) based MPTCP for DEL that strives to realize quicker iteration of DEL and better fairness (by ameliorating stragglers). Hybrid MPTCP essentially integrates two radical TCP developments: i) successful existing model-based MPTCP control strategies and ii) advanced emerging DRL-based techniques, and introduces a novel hybrid MPTCP data transport for easing the communication of the Agg-Avg process. Extensive emulation results demonstrate that the proposed hybrid MPTCP can overcome excess time consumption and ameliorate the application layer unfairness of DEL effectively without injecting additional inconstancy and stragglers.

* 13 pages, 15 figures

Via

Access Paper or Ask Questions

Regenerative Particle Thompson Sampling

Mar 15, 2022

Zeyu Zhou, Bruce Hajek, Nakjung Choi, Anwar Walid

Figure 1 for Regenerative Particle Thompson Sampling

Figure 2 for Regenerative Particle Thompson Sampling

Figure 3 for Regenerative Particle Thompson Sampling

Figure 4 for Regenerative Particle Thompson Sampling

Abstract:This paper proposes regenerative particle Thompson sampling (RPTS), a flexible variation of Thompson sampling. Thompson sampling itself is a Bayesian heuristic for solving stochastic bandit problems, but it is hard to implement in practice due to the intractability of maintaining a continuous posterior distribution. Particle Thompson sampling (PTS) is an approximation of Thompson sampling obtained by simply replacing the continuous distribution by a discrete distribution supported at a set of weighted static particles. We observe that in PTS, the weights of all but a few fit particles converge to zero. RPTS is based on the heuristic: delete the decaying unfit particles and regenerate new particles in the vicinity of fit surviving particles. Empirical evidence shows uniform improvement from PTS to RPTS and flexibility and efficacy of RPTS across a set of representative bandit problems, including an application to 5G network slicing.

* Mainbody 14 pages, appendix 32 pages, 16 figures

Via

Access Paper or Ask Questions

ElegantRL-Podracer: Scalable and Elastic Library for Cloud-Native Deep Reinforcement Learning

Dec 11, 2021

Xiao-Yang Liu, Zechu Li, Zhuoran Yang, Jiahao Zheng, Zhaoran Wang, Anwar Walid, Jian Guo, Michael I. Jordan

Figure 1 for ElegantRL-Podracer: Scalable and Elastic Library for Cloud-Native Deep Reinforcement Learning

Figure 2 for ElegantRL-Podracer: Scalable and Elastic Library for Cloud-Native Deep Reinforcement Learning

Figure 3 for ElegantRL-Podracer: Scalable and Elastic Library for Cloud-Native Deep Reinforcement Learning

Figure 4 for ElegantRL-Podracer: Scalable and Elastic Library for Cloud-Native Deep Reinforcement Learning

Abstract:Deep reinforcement learning (DRL) has revolutionized learning and actuation in applications such as game playing and robotic control. The cost of data collection, i.e., generating transitions from agent-environment interactions, remains a major challenge for wider DRL adoption in complex real-world problems. Following a cloud-native paradigm to train DRL agents on a GPU cloud platform is a promising solution. In this paper, we present a scalable and elastic library ElegantRL-podracer for cloud-native deep reinforcement learning, which efficiently supports millions of GPU cores to carry out massively parallel training at multiple levels. At a high-level, ElegantRL-podracer employs a tournament-based ensemble scheme to orchestrate the training process on hundreds or even thousands of GPUs, scheduling the interactions between a leaderboard and a training pool with hundreds of pods. At a low-level, each pod simulates agent-environment interactions in parallel by fully utilizing nearly 7,000 GPU CUDA cores in a single GPU. Our ElegantRL-podracer library features high scalability, elasticity and accessibility by following the development principles of containerization, microservices and MLOps. Using an NVIDIA DGX SuperPOD cloud, we conduct extensive experiments on various tasks in locomotion and stock trading and show that ElegantRL-podracer substantially outperforms RLlib. Our codes are available on GitHub.

* Deep Reinforcement Learning Workshop, NeurIPS 2021
* 9 pages, 7 figures

Via

Access Paper or Ask Questions

Emerging Trends in Federated Learning: From Model Fusion to Federated X Learning

Feb 25, 2021

Shaoxiong Ji, Teemu Saravirta, Shirui Pan, Guodong Long, Anwar Walid

Figure 1 for Emerging Trends in Federated Learning: From Model Fusion to Federated X Learning

Abstract:Federated learning is a new learning paradigm that decouples data collection and model training via multi-party computation and model aggregation. As a flexible learning setting, federated learning has the potential to integrate with other learning frameworks. We conduct a focused survey of federated learning in conjunction with other learning algorithms. Specifically, we explore various learning algorithms to improve the vanilla federated averaging algorithm and review model fusion methods such as adaptive aggregation, regularization, clustered methods, and Bayesian methods. Following the emerging trends, we also discuss federated learning in the intersection with other learning paradigms, termed as federated x learning, where x includes multitask learning, meta-learning, transfer learning, unsupervised learning, and reinforcement learning. This survey reviews the state of the art, challenges, and future directions.

Via

Access Paper or Ask Questions

Privacy-preserving Decentralized Aggregation for Federated Learning

Dec 28, 2020

Beomyeol Jeon, S. M. Ferdous, Muntasir Raihan Rahman, Anwar Walid

Figure 1 for Privacy-preserving Decentralized Aggregation for Federated Learning

Figure 2 for Privacy-preserving Decentralized Aggregation for Federated Learning

Figure 3 for Privacy-preserving Decentralized Aggregation for Federated Learning

Figure 4 for Privacy-preserving Decentralized Aggregation for Federated Learning

Abstract:Federated learning is a promising framework for learning over decentralized data spanning multiple regions. This approach avoids expensive central training data aggregation cost and can improve privacy because distributed sites do not have to reveal privacy-sensitive data. In this paper, we develop a privacy-preserving decentralized aggregation protocol for federated learning. We formulate the distributed aggregation protocol with the Alternating Direction Method of Multiplier (ADMM) and examine its privacy weakness. Unlike prior work that use Differential Privacy or homomorphic encryption for privacy, we develop a protocol that controls communication among participants in each round of aggregation to minimize privacy leakage. We establish its privacy guarantee against an honest-but-curious adversary. We also propose an efficient algorithm to construct such a communication pattern, inspired by combinatorial block design theory. Our secure aggregation protocol based on this novel group communication pattern design leads to an efficient algorithm for federated training with privacy guarantees. We evaluate our federated training algorithm on image classification and next-word prediction applications over benchmark datasets with 9 and 15 distributed sites. Evaluation results show that our algorithm performs comparably to the standard centralized federated learning method while preserving privacy; the degradation in test accuracy is only up to 0.73%.

* 10 pages, 6 figures

Via

Access Paper or Ask Questions

Dynamic Sampling and Selective Masking for Communication-Efficient Federated Learning

Mar 21, 2020

Shaoxiong Ji, Wenqi Jiang, Anwar Walid, Xue Li

Figure 1 for Dynamic Sampling and Selective Masking for Communication-Efficient Federated Learning

Figure 2 for Dynamic Sampling and Selective Masking for Communication-Efficient Federated Learning

Figure 3 for Dynamic Sampling and Selective Masking for Communication-Efficient Federated Learning

Figure 4 for Dynamic Sampling and Selective Masking for Communication-Efficient Federated Learning

Abstract:Federated learning (FL) is a novel machine learning setting which enables on-device intelligence via decentralized training and federated optimization. The rapid development of deep neural networks facilitates the learning techniques for modeling complex problems and emerges into federated deep learning under the federated setting. However, the tremendous amount of model parameters burdens the communication network with a high load of transportation. This paper introduces two approaches for improving communication efficiency by dynamic sampling and top-$k$ selective masking. The former controls the fraction of selected client models dynamically, while the latter selects parameters with top-$k$ largest values of difference for federated updating. Experiments on convolutional image classification and recurrent language modeling are conducted on three public datasets to show the effectiveness of our proposed methods.

* 15 pages, 9 figures

Via

Access Paper or Ask Questions

Deep Reinforcement Learning for Intelligent Transportation Systems

Dec 03, 2018

Xiao-Yang Liu, Zihan Ding, Sem Borst, Anwar Walid

Figure 1 for Deep Reinforcement Learning for Intelligent Transportation Systems

Figure 2 for Deep Reinforcement Learning for Intelligent Transportation Systems

Figure 3 for Deep Reinforcement Learning for Intelligent Transportation Systems

Figure 4 for Deep Reinforcement Learning for Intelligent Transportation Systems

Abstract:Intelligent Transportation Systems (ITSs) are envisioned to play a critical role in improving traffic flow and reducing congestion, which is a pervasive issue impacting urban areas around the globe. Rapidly advancing vehicular communication and edge cloud computation technologies provide key enablers for smart traffic management. However, operating viable real-time actuation mechanisms on a practically relevant scale involves formidable challenges, e.g., policy iteration and conventional Reinforcement Learning (RL) techniques suffer from poor scalability due to state space explosion. Motivated by these issues, we explore the potential for Deep Q-Networks (DQN) to optimize traffic light control policies. As an initial benchmark, we establish that the DQN algorithms yield the "thresholding" policy in a single-intersection. Next, we examine the scalability properties of DQN algorithms and their performance in a linear network topology with several intersections along a main artery. We demonstrate that DQN algorithms produce intelligent behavior, such as the emergence of "greenwave" patterns, reflecting their ability to learn favorable traffic light actuations.

Via

Access Paper or Ask Questions