Frame duplication is to duplicate a sequence of consecutive frames and insert or replace to conceal or imitate a specific event/content in the same source video. To automatically detect the duplicated frames in a manipulated video, we propose a coarse-to-fine deep convolutional neural network framework to detect and localize the frame duplications. We first run an I3D network to obtain the most candidate duplicated frame sequences and selected frame sequences, and then run a Siamese network with ResNet network to identify each pair of a duplicated frame and the corresponding selected frame. We also propose a heuristic strategy to formulate the video-level score. We then apply our inconsistency detector fine-tuned on the I3D network to distinguish duplicated frames from selected frames. With the experimental evaluation conducted on two video datasets, we strongly demonstrate that our proposed method outperforms the current state-of-the-art methods.