This study aims to introduce the cell load estimation problem of cell switching approaches in cellular networks specially-presented in a high-altitude platform station (HAPS)-assisted network. The problem arises from the fact that the traffic loads of sleeping base stations for the next time slot cannot be perfectly known, but they can rather be estimated, and any estimation error could result in divergence from the optimal decision, which subsequently affects the performance of energy efficiency. The traffic loads of the sleeping base stations for the next time slot are required because the switching decisions are made proactively in the current time slot. Two different Q-learning algorithms are developed; one is full-scale, focusing solely on the performance, while the other one is lightweight and addresses the computational cost. Results confirm that the estimation error is capable of changing cell switching decisions that yields performance divergence compared to no-error scenarios. Moreover, the developed Q-learning algorithms perform well since an insignificant difference (i.e., 0.3%) is observed between them and the optimum algorithm.