Abstract:The quality of frames is significant for both research and application of video frame interpolation (VFI). In recent VFI studies, the methods of full-reference image quality assessment have generally been used to evaluate the quality of VFI frames. However, high frame rate reference videos, necessities for the full-reference methods, are difficult to obtain in most applications of VFI. To evaluate the quality of VFI frames without reference videos, a no-reference perceptual quality assessment method is proposed in this paper. This method is more compatible with VFI application and the evaluation scores from it are consistent with human subjective opinions. A new quality assessment dataset for VFI was constructed through subjective experiments firstly, to assess the opinion scores of interpolated frames. The dataset was created from triplets of frames extracted from high-quality videos using 9 state-of-the-art VFI algorithms. The proposed method evaluates the perceptual coherence of frames incorporating the original pair of VFI inputs. Specifically, the method applies a triplet network architecture, including three parallel feature pipelines, to extract the deep perceptual features of the interpolated frame as well as the original pair of frames. Coherence similarities of the two-way parallel features are jointly calculated and optimized as a perceptual metric. In the experiments, both full-reference and no-reference quality assessment methods were tested on the new quality dataset. The results show that the proposed method achieves the best performance among all compared quality assessment methods on the dataset.