Abstract:The study of video prediction models is believed to be a fundamental approach to representation learning for videos. While a plethora of generative models for predicting the future frame pixel values given the past few frames exist, the quantitative evaluation of the predicted frames has been found to be extremely challenging. In this context, we introduce the problem of naturalness evaluation, which refers to how natural or realistic a predicted video looks. We create the Indian Institute of Science Video Naturalness Evaluation (IISc VINE) Database consisting of 300 videos, obtained by applying different prediction models on different datasets, and accompanying human opinion scores. 50 human subjects participated in our study yielding around 6000 human ratings of naturalness. Our subjective study reveals that human observers show a highly consistent judgement of naturalness. We benchmark several popularly used measures for evaluating video prediction and show that they do not adequately correlate with the subjective scores. We introduce two new features to help effectively capture naturalness. In particular, we show that motion compensated cosine similarities of deep features of predicted frames with past frames and deep features extracted from rescaled frame differences lead to state of the art naturalness prediction in accordance with human judgements. The database and code will be made publicly available at our project website: https://sites.google.com/site/nagabhushansn95/publications/vine.