Surface defect detection is essential and necessary for controlling the qualities of the products during manufacturing. The challenges in this complex task include: 1) collecting defective samples and manually labeling for training is time-consuming; 2) the defects' characteristics are difficult to define as new types of defect can happen all the time; 3) and the real-world product images contain lots of background noise. In this paper, we present a two-stage defect detection network based on the object detection model YOLO, and the normalizing flow-based defect detection model DifferNet. Our model has high robustness and performance on defect detection using real-world video clips taken from a production line monitoring system. The normalizing flow-based anomaly detection model only requires a small number of good samples for training and then perform defect detection on the product images detected by YOLO. The model we invent employs two novel strategies: 1) a two-stage network using YOLO and a normalizing flow-based model to perform product defect detection, 2) multi-scale image transformations are implemented to solve the issue product image cropped by YOLO includes many background noise. Besides, extensive experiments are conducted on a new dataset collected from the real-world factory production line. We demonstrate that our proposed model can learn on a small number of defect-free samples of single or multiple product types. The dataset will also be made public to encourage further studies and research in surface defect detection.