Abstract:Highlight detection in sports videos has a broad viewership and huge commercial potential. It is thus imperative to detect highlight scenes more suitably for human interest with high temporal accuracy. Since people instinctively suppress blinks during attention-grabbing events and synchronously generate blinks at attention break points in videos, the instantaneous blink rate can be utilized as a highly accurate temporal indicator of human interest. Therefore, in this study, we propose a novel, automatic highlight detection method based on the blink rate. The method trains a one-dimensional convolution network (1D-CNN) to assess blink rates at each video frame from the spatio-temporal pose features of figure skating videos. Experiments show that the method successfully estimates the blink rate in 94% of the video clips and predicts the temporal change in the blink rate around a jump event with high accuracy. Moreover, the method detects not only the representative athletic action, but also the distinctive artistic expression of figure skating performance as key frames. This suggests that the blink-rate-based supervised learning approach enables high-accuracy highlight detection that more closely matches human sensibility.