In this paper, we propose a simple yet effective solution to a change detection task that detects the difference between two images, which we call "spot the difference". Our approach uses CNN-based object detection by stacking two aligned images as input and considering the differences between the two images as objects to detect. An early-merging architecture is used as the backbone network. Our method is accurate, fast and robust while using very cheap annotation. We verify the proposed method on the task of change detection between the digital design and its photographic image of a book. Compared to verification based methods, our object detection based method outperforms other methods by a large margin and gives extra information of location. We compress the network and achieve 24 times acceleration while keeping the accuracy. Besides, as we synthesize the training data for detection using weakly labeled images, our method does not need expensive bounding box annotation.