Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Wenfei Wu

Accelerating Distributed Deep Learning using Lossless Homomorphic Compression

Feb 12, 2024

Haoyu Li, Yuchen Xu, Jiayi Chen, Rohit Dwivedula, Wenfei Wu, Keqiang He, Aditya Akella, Daehyeok Kim

Figure 1 for Accelerating Distributed Deep Learning using Lossless Homomorphic Compression

Figure 2 for Accelerating Distributed Deep Learning using Lossless Homomorphic Compression

Figure 3 for Accelerating Distributed Deep Learning using Lossless Homomorphic Compression

Figure 4 for Accelerating Distributed Deep Learning using Lossless Homomorphic Compression

Abstract:As deep neural networks (DNNs) grow in complexity and size, the resultant increase in communication overhead during distributed training has become a significant bottleneck, challenging the scalability of distributed training systems. Existing solutions, while aiming to mitigate this bottleneck through worker-level compression and in-network aggregation, fall short due to their inability to efficiently reconcile the trade-offs between compression effectiveness and computational overhead, hindering overall performance and scalability. In this paper, we introduce a novel compression algorithm that effectively merges worker-level compression with in-network aggregation. Our solution is both homomorphic, allowing for efficient in-network aggregation without CPU/GPU processing, and lossless, ensuring no compromise on training accuracy. Theoretically optimal in compression and computational efficiency, our approach is empirically validated across diverse DNN models such as NCF, LSTM, VGG19, and BERT-base, showing up to a 6.33$\times$ improvement in aggregation throughput and a 3.74$\times$ increase in per-iteration training speed.

Via

Access Paper or Ask Questions

Efficient Data-Plane Memory Scheduling for In-Network Aggregation

Jan 17, 2022

Hao Wang, Yuxuan Qin, ChonLam Lao, Yanfang Le, Wenfei Wu, Kai Chen

Figure 1 for Efficient Data-Plane Memory Scheduling for In-Network Aggregation

Figure 2 for Efficient Data-Plane Memory Scheduling for In-Network Aggregation

Figure 3 for Efficient Data-Plane Memory Scheduling for In-Network Aggregation

Figure 4 for Efficient Data-Plane Memory Scheduling for In-Network Aggregation

Abstract:As the scale of distributed training grows, communication becomes a bottleneck. To accelerate the communication, recent works introduce In-Network Aggregation (INA), which moves the gradients summation into network middle-boxes, e.g., programmable switches to reduce the traffic volume. However, switch memory is scarce compared to the volume of gradients transmitted in distributed training. Although literature applies methods like pool-based streaming or dynamic sharing to tackle the mismatch, switch memory is still a potential performance bottleneck. Furthermore, we observe the under-utilization of switch memory due to the synchronization requirement for aggregator deallocation in recent works. To improve the switch memory utilization, we propose ESA, an $\underline{E}$fficient Switch Memory $\underline{S}$cheduler for In-Network $\underline{A}$ggregation. At its cores, ESA enforces the preemptive aggregator allocation primitive and introduces priority scheduling at the data-plane, which improves the switch memory utilization and average job completion time (JCT). Experiments show that ESA can improve the average JCT by up to $1.35\times$.

Via

Access Paper or Ask Questions

Simultaneously Achieving Sublinear Regret and Constraint Violations for Online Convex Optimization with Time-varying Constraints

Nov 15, 2021

Qingsong Liu, Wenfei Wu, Longbo Huang, Zhixuan Fang

Figure 1 for Simultaneously Achieving Sublinear Regret and Constraint Violations for Online Convex Optimization with Time-varying Constraints

Figure 2 for Simultaneously Achieving Sublinear Regret and Constraint Violations for Online Convex Optimization with Time-varying Constraints

Figure 3 for Simultaneously Achieving Sublinear Regret and Constraint Violations for Online Convex Optimization with Time-varying Constraints

Figure 4 for Simultaneously Achieving Sublinear Regret and Constraint Violations for Online Convex Optimization with Time-varying Constraints

Abstract:In this paper, we develop a novel virtual-queue-based online algorithm for online convex optimization (OCO) problems with long-term and time-varying constraints and conduct a performance analysis with respect to the dynamic regret and constraint violations. We design a new update rule of dual variables and a new way of incorporating time-varying constraint functions into the dual variables. To the best of our knowledge, our algorithm is the first parameter-free algorithm to simultaneously achieve sublinear dynamic regret and constraint violations. Our proposed algorithm also outperforms the state-of-the-art results in many aspects, e.g., our algorithm does not require the Slater condition. Meanwhile, for a group of practical and widely-studied constrained OCO problems in which the variation of consecutive constraints is smooth enough across time, our algorithm achieves $O(1)$ constraint violations. Furthermore, we extend our algorithm and analysis to the case when the time horizon $T$ is unknown. Finally, numerical experiments are conducted to validate the theoretical guarantees of our algorithm, and some applications of our proposed framework will be outlined.

* Proceedings of the 39th International Symposium on Computer Performance, Modeling, Measurements and Evaluation (Performance), 2021
* 31 pages, it has been accepted at Performance 2021

Via

Access Paper or Ask Questions

Indoor Positioning System based on Visible Light Communication for Mobile Robot in Nuclear Power Plant

Nov 16, 2020

Hongyun Xie, Linyi Huang, Wenfei Wu

Figure 1 for Indoor Positioning System based on Visible Light Communication for Mobile Robot in Nuclear Power Plant

Figure 2 for Indoor Positioning System based on Visible Light Communication for Mobile Robot in Nuclear Power Plant

Figure 3 for Indoor Positioning System based on Visible Light Communication for Mobile Robot in Nuclear Power Plant

Figure 4 for Indoor Positioning System based on Visible Light Communication for Mobile Robot in Nuclear Power Plant

Abstract:Visible light positioning (VLP) is widely believed to be a cost-effective answer to the growing demanded for robot indoor positioning. Considering that some extreme environments require robot to be equipped with a precise and radiation-resistance indoor positioning system for doing difficult work, a novel VLP system with high accuracy is proposed to realize the long-playing inspection and intervention under radiation environment. The proposed system with sufficient radiation-tolerance is critical for operational inspection, maintenance and intervention tasks in nuclear facilities. Firstly, we designed intelligent LED lamp with visible light communication (VLC) function to dynamically create the indoor GPS tracking system. By installing the proposed lamps that replace standard lighting in key locations in the nuclear power plant, the proposed system can strengthen the safety of mobile robot and help for efficient inspection in the large-scale field. Secondly, in order to enhance the radiation-tolerance and multi-scenario of the proposed system, we proposed a shielding protection method for the camera vertically installed on the robot, which ensures that the image elements of the camera namely the captured VLP information is not affected by radiation. Besides, with the optimized visible light positioning algorithm based on dispersion calibration method, the proposed VLP system can achieve an average positioning accuracy of 0.82cm and ensure that 90% positioning errors are less than 1.417cm. Therefore, the proposed system not only has sufficient radiation-tolerance but achieve state-of-the-art positioning accuracy in the visible light positioning field.

* 8 pages, 10 figures

Via

Access Paper or Ask Questions