Federated Learning (FL) is a promising machine learning approach for Internet of Things (IoT), but it has to address network congestion problems when the population of IoT devices grows. Hierarchical FL (HFL) alleviates this issue by distributing model aggregation to multiple edge servers. Nevertheless, the challenge of communication overhead remains, especially in scenarios where all IoT devices simultaneously join the training process. For scalability, practical HFL schemes select a subset of IoT devices to participate in the training, hence the notion of device scheduling. In this setting, only selected IoT devices are scheduled to participate in the global training, with each of them being assigned to one edge server. Existing HFL assignment methods are primarily based on search mechanisms, which suffer from high latency in finding the optimal assignment. This paper proposes an improved K-Center algorithm for device scheduling and introduces a deep reinforcement learning-based approach for assigning IoT devices to edge servers. Experiments show that scheduling 50% of IoT devices is generally adequate for achieving convergence in HFL with much lower time delay and energy consumption. In cases where reduction in energy consumption (such as in Green AI) and reduction of messages (to avoid burst traffic) are key objectives, scheduling 30% IoT devices allows a substantial reduction in energy and messages with similar model accuracy.