Abstract:Deep neural network (DNN) models are effective solutions for industry 4.0 applications (\eg oil spill detection, fire detection, anomaly detection). However, training a DNN network model needs a considerable amount of data collected from various sources and transferred to the central cloud server that can be expensive and sensitive to privacy. For instance, in the remote offshore oil field where network connectivity is vulnerable, a federated fog environment can be a potential computing platform. Hence it is feasible to perform computation within the federation. On the contrary, performing a DNN model training using fog systems poses a security issue that the federated learning (FL) technique can resolve. In this case, the new challenge is the class imbalance problem that can be inherited in local data sets and can degrade the performance of the global model. Therefore, FL training needs to be performed considering the class imbalance problem locally. In addition, an efficient technique to select the relevant worker model needs to be adopted at the global level to increase the robustness of the global model. Accordingly, we utilize one of the suitable loss functions addressing the class imbalance in workers at the local level. In addition, we employ a dynamic threshold mechanism with user-defined worker's weight to efficiently select workers for aggregation that improve the global model's robustness. Finally, we perform an extensive empirical evaluation to explore the benefits of our solution and find up to 3-5% performance improvement than baseline federated learning methods.
Abstract:Cloud-based Deep Neural Network (DNN) applications that make latency-sensitive inference are becoming an indispensable part of Industry 4.0. Due to the multi-tenancy and resource heterogeneity, both inherent to the cloud computing environments, the inference time of DNN-based applications are stochastic. Such stochasticity, if not captured, can potentially lead to low Quality of Service (QoS) or even a disaster in critical sectors, such as Oil and Gas industry. To make Industry 4.0 robust, solution architects and researchers need to understand the behavior of DNN-based applications and capture the stochasticity exists in their inference times. Accordingly, in this study, we provide a descriptive analysis of the inference time from two perspectives. First, we perform an application-centric analysis and statistically model the execution time of four categorically different DNN applications on both Amazon and Chameleon clouds. Second, we take a resource-centric approach and analyze a rate-based metric in form of Million Instruction Per Second (MIPS) for heterogeneous machines in the cloud. This non-parametric modeling, achieved via Jackknife and Bootstrap re-sampling methods, provides the confidence interval of MIPS for heterogeneous cloud machines. The findings of this research can be helpful for researchers and cloud solution architects to develop solutions that are robust against the stochastic nature of the inference time of DNN applications in the cloud and can offer a higher QoS to their users and avoid unintended outcomes.