The proliferation of connected devices in indoor environments opens the floor to a myriad of indoor applications with positioning services as key enablers. However, as privacy issues and resource constraints arise, it becomes more challenging to design accurate positioning systems as required by most applications. To overcome the latter challenges, we present in this paper, a federated learning (FL) framework for hierarchical 3D indoor localization using a deep neural network. Indeed, we firstly shed light on the prominence of exploiting the hierarchy between floors and buildings in a multi-building and multi-floor indoor environment. Then, we propose an FL framework to train the designed hierarchical model. The performance evaluation shows that by adopting a hierarchical learning scheme, we can improve the localization accuracy by up to 24.06% compared to the non-hierarchical approach. We also obtain a building and floor prediction accuracy of 99.90% and 94.87% respectively. With the proposed FL framework, we can achieve a near-performance characteristic as of the central training with an increase of only 7.69% in the localization error. Moreover, the conducted scalability study reveals that the FL system accuracy is improved when more devices join the training.