Generative Adversarial Networks (GANs) and their variants have achieved remarkable success on natural images. It aims to approximate the distribution of the training datasets. However, their performance degrades when applied to remote sensing (RS) images, and the discriminator often suffers from the overfitting problem. In this paper, we examine the differences between natural and RS images and find that the intrinsic dimensions of RS images are much lower than those of natural images. Besides, the low-dimensional data manifold of RS images may exacerbate the uneven sampling of training datasets and introduce biased information. The discriminator can easily overfit to the biased training distribution, leading to a faulty generation model, even the mode collapse problem. While existing GANs focus on the general distribution of RS datasets, they often neglect the underlying data manifold. In respond, we propose a learnable information-theoretic measure that preserves the intrinsic structures of the original data, and establish a unified GAN framework for manifold alignment in supervised and unsupervised RS image generation.