With advancements of deep learning techniques, it is now possible to generate super-realistic fake images and videos. These manipulated forgeries could reach mass audience and result in adverse impacts on our society. Although lots of efforts have been devoted to detect forgeries, their performance drops significantly on previously unseen but related manipulations and the detection generalization capability remains a problem. To bridge this gap, in this paper we propose Locality-aware AutoEncoder (LAE), which combines fine-grained representation learning and enforcing locality in a unified framework. In the training process, we use pixel-wise mask to regularize local interpretation of LAE to enforce the model to learn intrinsic representation from the forgery region, instead of capturing artifacts in the training set and learning spurious correlations to perform detection. We further propose an active learning framework to select the challenging candidates for labeling, to reduce the annotation efforts to regularize interpretations. Experimental results indicate that LAE indeed could focus on the forgery regions to make decisions. The results further show that LAE achieves superior generalization performance compared to state-of-the-arts on forgeries generated by alternative manipulation methods.