Phenotype-based screening has attracted much attention for identifying cell-active compounds. Transcriptional and proteomic profiles of cell population or single cells are informative phenotypic measures of cellular responses to perturbations. In this paper, we proposed a deep learning framework based on encoder-decoder architecture that maps the initial cellular states to a latent space, in which we assume the effects of drug perturbation on cellular states follow linear additivity. Next, we introduced the cycle consistency constraints to enforce that initial cellular state subjected to drug perturbations would produce the perturbed cellular responses, and, conversely, removal of drug perturbation from the perturbed cellular states would restore the initial cellular states. The cycle consistency constraints and linear modeling in latent space enable to learn interpretable and transferable drug perturbation representations, so that our model can predict cellular response to unseen drugs. We validated our model on three different types of datasets, including bulk transcriptional responses, bulk proteomic responses, and single-cell transcriptional responses to drug perturbations. The experimental results show that our model achieves better performance than existing state-of-the-art methods.