Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Ludmila Cherkasova

Async-HFL: Efficient and Robust Asynchronous Federated Learning in Hierarchical IoT Networks

Jan 17, 2023

Xiaofan Yu, Ludmila Cherkasova, Harsh Vardhan, Quanling Zhao, Emily Ekaireb, Xiyuan Zhang, Arya Mazumdar, Tajana Rosing

Figure 1 for Async-HFL: Efficient and Robust Asynchronous Federated Learning in Hierarchical IoT Networks

Figure 2 for Async-HFL: Efficient and Robust Asynchronous Federated Learning in Hierarchical IoT Networks

Figure 3 for Async-HFL: Efficient and Robust Asynchronous Federated Learning in Hierarchical IoT Networks

Figure 4 for Async-HFL: Efficient and Robust Asynchronous Federated Learning in Hierarchical IoT Networks

Abstract:Federated Learning (FL) has gained increasing interest in recent years as a distributed on-device learning paradigm. However, multiple challenges remain to be addressed for deploying FL in real-world Internet-of-Things (IoT) networks with hierarchies. Although existing works have proposed various approaches to account data heterogeneity, system heterogeneity, unexpected stragglers and scalibility, none of them provides a systematic solution to address all of the challenges in a hierarchical and unreliable IoT network. In this paper, we propose an asynchronous and hierarchical framework (Async-HFL) for performing FL in a common three-tier IoT network architecture. In response to the largely varied delays, Async-HFL employs asynchronous aggregations at both the gateway and the cloud levels thus avoids long waiting time. To fully unleash the potential of Async-HFL in converging speed under system heterogeneities and stragglers, we design device selection at the gateway level and device-gateway association at the cloud level. Device selection chooses edge devices to trigger local training in real-time while device-gateway association determines the network topology periodically after several cloud epochs, both satisfying bandwidth limitation. We evaluate Async-HFL's convergence speedup using large-scale simulations based on ns-3 and a network topology from NYCMesh. Our results show that Async-HFL converges 1.08-1.31x faster in wall-clock time and saves up to 21.6% total communication cost compared to state-of-the-art asynchronous FL algorithms (with client selection). We further validate Async-HFL on a physical deployment and observe robust convergence under unexpected stragglers.

* Submitted for review

Via

Access Paper or Ask Questions

DeepFT: Fault-Tolerant Edge Computing using a Self-Supervised Deep Surrogate Model

Dec 02, 2022

Shreshth Tuli, Giuliano Casale, Ludmila Cherkasova, Nicholas R. Jennings

Abstract:The emergence of latency-critical AI applications has been supported by the evolution of the edge computing paradigm. However, edge solutions are typically resource-constrained, posing reliability challenges due to heightened contention for compute and communication capacities and faulty application behavior in the presence of overload conditions. Although a large amount of generated log data can be mined for fault prediction, labeling this data for training is a manual process and thus a limiting factor for automation. Due to this, many companies resort to unsupervised fault-tolerance models. Yet, failure models of this kind can incur a loss of accuracy when they need to adapt to non-stationary workloads and diverse host characteristics. To cope with this, we propose a novel modeling approach, called DeepFT, to proactively avoid system overloads and their adverse effects by optimizing the task scheduling and migration decisions. DeepFT uses a deep surrogate model to accurately predict and diagnose faults in the system and co-simulation based self-supervised learning to dynamically adapt the model in volatile settings. It offers a highly scalable solution as the model size scales by only 3 and 1 percent per unit increase in the number of active tasks and hosts. Extensive experimentation on a Raspberry-Pi based edge cluster with DeFog benchmarks shows that DeepFT can outperform state-of-the-art baseline methods in fault-detection and QoS metrics. Specifically, DeepFT gives the highest F1 scores for fault-detection, reducing service deadline violations by up to 37\% while also improving response time by up to 9%.

* Accepted in IEEE INFOCOM 2023

Via

Access Paper or Ask Questions

Combining Individual and Joint Networking Behavior for Intelligent IoT Analytics

Mar 07, 2022

Jeya Vikranth Jeyakumar, Ludmila Cherkasova, Saina Lajevardi, Moray Allan, Yue Zhao, John Fry, Mani Srivastava

Figure 1 for Combining Individual and Joint Networking Behavior for Intelligent IoT Analytics

Figure 2 for Combining Individual and Joint Networking Behavior for Intelligent IoT Analytics

Figure 3 for Combining Individual and Joint Networking Behavior for Intelligent IoT Analytics

Figure 4 for Combining Individual and Joint Networking Behavior for Intelligent IoT Analytics

Abstract:The IoT vision of a trillion connected devices over the next decade requires reliable end-to-end connectivity and automated device management platforms. While we have seen successful efforts for maintaining small IoT testbeds, there are multiple challenges for the efficient management of large-scale device deployments. With Industrial IoT, incorporating millions of devices, traditional management methods do not scale well. In this work, we address these challenges by designing a set of novel machine learning techniques, which form a foundation of a new tool, it IoTelligent, for IoT device management, using traffic characteristics obtained at the network level. The design of our tool is driven by the analysis of 1-year long networking data, collected from 350 companies with IoT deployments. The exploratory analysis of this data reveals that IoT environments follow the famous Pareto principle, such as: (i) 10% of the companies in the dataset contribute to 90% of the entire traffic; (ii) 7% of all the companies in the set own 90% of all the devices. We designed and evaluated CNN, LSTM, and Convolutional LSTM models for demand forecasting, with a conclusion of the Convolutional LSTM model being the best. However, maintaining and updating individual company models is expensive. In this work, we design a novel, scalable approach, where a general demand forecasting model is built using the combined data of all the companies with a normalization factor. Moreover, we introduce a novel technique for device management, based on autoencoders. They automatically extract relevant device features to identify device groups with similar behavior to flag anomalous devices.

* InInternational Conference on Internet of Things 2020 Sep 18 (pp. 45-62). Springer, Cham

Via

Access Paper or Ask Questions