This study investigates the integration of a high altitude platform station (HAPS), a non-terrestrial network (NTN) node, into the cell-switching paradigm for energy saving. By doing so, the sustainability and ubiquitous connectivity targets can be achieved. Besides, a delay-aware approach is also adopted, where the delay profiles of users are respected in such a way that we attempt to meet the latency requirements of users with a best-effort strategy. To this end, a novel, simple, and lightweight Q-learning algorithm is designed to address the cell-switching optimization problem. During the simulation campaigns, different interference scenarios and delay situations between base stations are examined in terms of energy consumption and quality-of-service (QoS), and the results confirm the efficacy of the proposed Q-learning algorithm.