Abstract:Federated learning (FL) is a decentralized AI mechanism suitable for a large number of devices like in smart IoT. A major challenge of FL is the non-IID dataset problem, originating from the heterogeneous data collected by FL participants, leading to performance deterioration of the trained global model. There have been various attempts to rectify non-IID dataset, mostly focusing on manipulating the collected data. This paper, however, proposes a novel approach to ensure data IIDness by properly clustering and grouping mobile IoT nodes exploiting their geographical characteristics, so that each FL group can achieve IID dataset. We first provide an experimental evidence for the independence and identicalness features of IoT data according to the inter-device distance, and then propose Dynamic Clustering and Partial-Steady Grouping algorithms that partition FL participants to achieve near-IIDness in their dataset while considering device mobility. Our mechanism significantly outperforms benchmark grouping algorithms at least by 110 times in terms of the joint cost between the number of dropout devices and the evenness in per-group device count, with a mild increase in the number of groups only by up to 0.93 groups.