In this paper, the Point Adversarial Self Mining (PASM) approach, a simple yet effective way to progressively mine knowledge from training samples, is proposed to produce training data for CNNs to improve the performance and network generality in Facial Expression Recognition (FER) task. In order to achieve a high prediction accuracy under real-world scenarios, most of the existing works choose to manipulate the network architectures and design sophisticated loss terms. Although demonstrated to be effective in real scenarios, those aforementioned methods require extra efforts in network design. Inspired by random erasing and adversarial erasing, we propose PASM for data augmentation, simulating the data distribution in the wild. Specifically, given a sample and a pre-trained network, our proposed approach locates the informative region in the sample generated by point adversarial attack policy. The informative region is highly structured and sparse. Comparing to the regions produced by random erasing which selects the region in a purely random way and adversarial erasing which operates by attention maps, the located informative regions obtained by PASM are more adaptive and better aligned with the previous findings: not all but only a few facial regions contribute to the accurate prediction. Then, the located informative regions are masked out from the original samples to generate augmented images, which would force the network to explore additional information from other less informative regions. The augmented images are used to finetune the network to enhance its generality. In the refinement process, we take advantage of knowledge distillation, utilizing the pre-trained network to provide guidance and retain knowledge from old samples to train a new network with the same structural configuration.