Abstract:Wireless federated learning (WFL) suffers from heterogeneity prevailing in the data distributions, computing powers, and channel conditions of participating devices. This paper presents a new Federated Learning with Adjusted leaRning ratE (FLARE) framework to mitigate the impact of the heterogeneity. The key idea is to allow the participating devices to adjust their individual learning rates and local training iterations, adapting to their instantaneous computing powers. The convergence upper bound of FLARE is established rigorously under a general setting with non-convex models in the presence of non-i.i.d. datasets and imbalanced computing powers. By minimizing the upper bound, we further optimize the scheduling of FLARE to exploit the channel heterogeneity. A nested problem structure is revealed to facilitate iteratively allocating the bandwidth with binary search and selecting devices with a new greedy method. A linear problem structure is also identified and a low-complexity linear programming scheduling policy is designed when training models have large Lipschitz constants. Experiments demonstrate that FLARE consistently outperforms the baselines in test accuracy, and converges much faster with the proposed scheduling policy.
Abstract:Local stochastic gradient descent (SGD) is a fundamental approach in achieving communication efficiency in Federated Learning (FL) by allowing individual workers to perform local updates. However, the presence of heterogeneous data distributions across working nodes causes each worker to update its local model towards a local optimum, leading to the phenomenon known as ``client-drift" and resulting in slowed convergence. To address this issue, previous works have explored methods that either introduce communication overhead or suffer from unsteady performance. In this work, we introduce a novel metric called ``degree of divergence," quantifying the angle between the local gradient and the global reference direction. Leveraging this metric, we propose the divergence-based adaptive aggregation (DRAG) algorithm, which dynamically ``drags" the received local updates toward the reference direction in each round without requiring extra communication overhead. Furthermore, we establish a rigorous convergence analysis for DRAG, proving its ability to achieve a sublinear convergence rate. Compelling experimental results are presented to illustrate DRAG's superior performance compared to state-of-the-art algorithms in effectively managing the client-drift phenomenon. Additionally, DRAG exhibits remarkable resilience against certain Byzantine attacks. By securely sharing a small sample of the client's data with the FL server, DRAG effectively counters these attacks, as demonstrated through comprehensive experiments.
Abstract:The widespread adoption of edge computing has emerged as a prominent trend for alleviating task processing delays and reducing energy consumption. However, the dynamic nature of network conditions and the varying computation capacities of edge servers (ESs) can introduce disparities between computation loads and available computing resources in edge computing networks, potentially leading to inadequate service quality. To address this challenge, this paper investigates a practical scenario characterized by dynamic task offloading. Initially, we examine traditional Multi-armed Bandit (MAB) algorithms, namely the $\varepsilon$-greedy algorithm and the UCB1-based algorithm. However, both algorithms exhibit certain weaknesses in effectively addressing the tidal data traffic patterns. Consequently, based on MAB, we propose an adaptive task offloading algorithm (ATOA) that overcomes these limitations. By conducting extensive simulations, we demonstrate the superiority of our ATOA solution in reducing task processing latency compared to conventional MAB methods. This substantiates the effectiveness of our approach in enhancing the performance of edge computing networks and improving overall service quality.
Abstract:Cloud native technology has revolutionized 5G beyond and 6G communication networks, offering unprecedented levels of operational automation, flexibility, and adaptability. However, the vast array of cloud native services and applications presents a new challenge in resource allocation for dynamic cloud computing environments. To tackle this challenge, we investigate a cloud native wireless architecture that employs container-based virtualization to enable flexible service deployment. We then study two representative use cases: network slicing and Multi-Access Edge Computing. To optimize resource allocation in these scenarios, we leverage deep reinforcement learning techniques and introduce two model-free algorithms capable of monitoring the network state and dynamically training allocation policies. We validate the effectiveness of our algorithms in a testbed developed using Free5gc. Our findings demonstrate significant improvements in network efficiency, underscoring the potential of our proposed techniques in unlocking the full potential of cloud native wireless networks.
Abstract:Recent advancements in space technology have equipped low Earth Orbit (LEO) satellites with the capability to perform complex functions and run AI applications. Federated Learning (FL) on LEO satellites enables collaborative training of a global ML model without the need for sharing large datasets. However, intermittent connectivity between satellites and ground stations can lead to stale gradients and unstable learning, thereby limiting learning performance. In this paper, we propose FedGSM, a novel asynchronous FL algorithm that introduces a compensation mechanism to mitigate gradient staleness. FedGSM leverages the deterministic and time-varying topology of the orbits to offset the negative effects of staleness. Our simulation results demonstrate that FedGSM outperforms state-of-the-art algorithms for both IID and non-IID datasets, underscoring its effectiveness and advantages. We also investigate the effect of system parameters.
Abstract:Synchronous local stochastic gradient descent (local SGD) suffers from some workers being idle and random delays due to slow and straggling workers, as it waits for the workers to complete the same amount of local updates. In this paper, to mitigate stragglers and improve communication efficiency, a novel local SGD strategy, named HetSyn, is developed. The key point is to keep all the workers computing continually at each synchronization round, and make full use of any effective (completed) local update of each worker regardless of stragglers. An analysis of the average wall-clock time, average number of local updates and average number of uploading workers per round is provided to gauge the performance of HetSyn. The convergence of HetSyn is also rigorously established even when the objective function is nonconvex. Experimental results show the superiority of the proposed HetSyn against state-of-the-art schemes through utilization of additional effective local updates at each worker, and the influence of system parameters is studied. By allowing heterogeneous synchronization with different numbers of local updates across workers, HetSyn provides substantial improvements both in time and communication efficiency.
Abstract:With rising male infertility, sperm head morphology classification becomes critical for accurate and timely clinical diagnosis. Recent deep learning (DL) morphology analysis methods achieve promising benchmark results, but leave performance and robustness on the table by relying on limited and possibly noisy class labels. To address this, we introduce a new DL training framework that leverages anatomical and image priors from human sperm microscopy crops to extract useful features without additional labeling cost. Our core idea is to distill sperm head information with reliably-generated pseudo-masks and unsupervised spatial prediction tasks. The predicted foreground masks from this distillation step are then leveraged to regularize and reduce image and label noise in the tuning stage. We evaluate our new approach on two public sperm datasets and achieve state-of-the-art performances (e.g. 65.9% SCIAN accuracy and 96.5% HuSHeM accuracy).
Abstract:Wall-clock convergence time and communication load are key performance metrics for the distributed implementation of stochastic gradient descent (SGD) in parameter server settings. Communication-adaptive distributed Adam (CADA) has been recently proposed as a way to reduce communication load via the adaptive selection of workers. CADA is subject to performance degradation in terms of wall-clock convergence time in the presence of stragglers. This paper proposes a novel scheme named grouping-based CADA (G-CADA) that retains the advantages of CADA in reducing the communication load, while increasing the robustness to stragglers at the cost of additional storage at the workers. G-CADA partitions the workers into groups of workers that are assigned the same data shards. Groups are scheduled adaptively at each iteration, and the server only waits for the fastest worker in each selected group. We provide analysis and experimental results to elaborate the significant gains on the wall-clock time, as well as communication load and computation load, of G-CADA over other benchmark schemes.
Abstract:Today's research in recommender systems is largely based on experimental designs that are static in a sense that they do not consider potential longitudinal effects of providing recommendations to users. In reality, however, various important and interesting phenomena only emerge or become visible over time, e.g., when a recommender system continuously reinforces the popularity of already successful artists on a music streaming site or when recommendations that aim at profit maximization lead to a loss of consumer trust in the long run. In this paper, we discuss how Agent-Based Modeling and Simulation (ABM) techniques can be used to study such important longitudinal dynamics of recommender systems. To that purpose, we provide an overview of the ABM principles, outline a simulation framework for recommender systems based on the literature, and discuss various practical research questions that can be addressed with such an ABM-based simulation framework.
Abstract:The Gleason grading system using histological images is the most powerful diagnostic and prognostic predictor of prostate cancer. The current standard inspection is evaluating Gleason H&E-stained histopathology images by pathologists. However, it is complicated, time-consuming, and subject to observers. Deep learning (DL) based-methods that automatically learn image features and achieve higher generalization ability have attracted significant attention. However, challenges remain especially using DL to train the whole slide image (WSI), a predominant clinical source in the current diagnostic setting, containing billions of pixels, morphological heterogeneity, and artifacts. Hence, we proposed a convolutional neural network (CNN)-based automatic classification method for accurate grading of PCa using whole slide histopathology images. In this paper, a data augmentation method named Patch-Based Image Reconstruction (PBIR) was proposed to reduce the high resolution and increase the diversity of WSIs. In addition, a distribution correction (DC) module was developed to enhance the adaption of pretrained model to the target dataset by adjusting the data distribution. Besides, a Quadratic Weighted Mean Square Error (QWMSE) function was presented to reduce the misdiagnosis caused by equal Euclidean distances. Our experiments indicated the combination of PBIR, DC, and QWMSE function was necessary for achieving superior expert-level performance, leading to the best results (0.8885 quadratic-weighted kappa coefficient).