Deep learning has recently attracted significant attention in the field of hyperspectral images (HSIs) classification. However, the construction of an efficient deep neural network (DNN) mostly relies on a large number of labeled samples being available. To address this problem, this paper proposes a unified deep network, combined with active transfer learning that can be well-trained for HSIs classification using only minimally labeled training data. More specifically, deep joint spectral-spatial feature is first extracted through hierarchical stacked sparse autoencoder (SSAE) networks. Active transfer learning is then exploited to transfer the pre-trained SSAE network and the limited training samples from the source domain to the target domain, where the SSAE network is subsequently fine-tuned using the limited labeled samples selected from both source and target domain by corresponding active learning strategies. The advantages of our proposed method are threefold: 1) the network can be effectively trained using only limited labeled samples with the help of novel active learning strategies; 2) the network is flexible and scalable enough to function across various transfer situations, including cross-dataset and intra-image; 3) the learned deep joint spectral-spatial feature representation is more generic and robust than many joint spectral-spatial feature representation. Extensive comparative evaluations demonstrate that our proposed method significantly outperforms many state-of-the-art approaches, including both traditional and deep network-based methods, on three popular datasets.