Video surveillance is gaining increasing popularity to assist in railway intrusion detection in recent years. However, efficient and accurate intrusion detection remains a challenging issue due to: (a) limited sample number: only small sample size (or portion) of intrusive video frames is available; (b) low inter-scene dissimilarity: various railway track area scenes are captured by cameras installed in different landforms; (c) high intra-scene similarity: the video frames captured by an individual camera share a same backgound. In this paper, an efficient few-shot learning solution is developed to address the above issues. In particular, an enhanced model-agnostic meta-learner is trained using both the original video frames and segmented masks of track area extracted from the video. Moreover, theoretical analysis and engineering solutions are provided to cope with the highly similar video frames in the meta-model training phase. The proposed method is tested on realistic railway video dataset. Numerical results show that the enhanced meta-learner successfully adapts unseen scene with only few newly collected video frame samples, and its intrusion detection accuracy outperforms that of the standard randomly initialized supervised learning.