Recent works have validated the possibility of improving energy efficiency in radio access networks (RANs), achieved by dynamically turning on/off some base stations (BSs). In this paper, we extend the research over BS switching operations, which should match up with traffic load variations. Instead of depending on the dynamic traffic loads which are still quite challenging to precisely forecast, we firstly formulate the traffic variations as a Markov decision process. Afterwards, in order to foresightedly minimize the energy consumption of RANs, we design a reinforcement learning framework based BS switching operation scheme. Furthermore, to avoid the underlying curse of dimensionality in reinforcement learning, a transfer actor-critic algorithm (TACT), which utilizes the transferred learning expertise in historical periods or neighboring regions, is proposed and provably converges. In the end, we evaluate our proposed scheme by extensive simulations under various practical configurations and show that the proposed TACT algorithm contributes to a performance jumpstart and demonstrates the feasibility of significant energy efficiency improvement at the expense of tolerable delay performance.