Abstract:The integration of low earth orbit (LEO) satellites with terrestrial communication networks holds the promise of seamless global connectivity. The efficiency of this connection, however, depends on the availability of reliable channel state information (CSI). Due to the large space-ground propagation delays, the estimated CSI is outdated. In this paper we consider the downlink of a satellite operating as a base station in support of multiple mobile users. The estimated outdated CSI is used at the satellite side to design a transmit precoding (TPC) matrix for the downlink. We propose a deep reinforcement learning (DRL)-based approach to optimize the TPC matrices, with the goal of maximizing the achievable data rate. We utilize the deep deterministic policy gradient (DDPG) algorithm to handle the continuous action space, and we employ state augmentation techniques to deal with the delayed observations and rewards. We show that the DRL agent is capable of exploiting the time-domain correlations of the channels for constructing accurate TPC matrices. This is because the proposed method is capable of compensating for the effects of delayed CSI in different frequency bands. Furthermore, we study the effect of handovers in the system, and show that the DRL agent is capable of promptly adapting to the environment when a handover occurs.