Abstract:Primary object segmentation plays an important role in understanding videos generated by unmanned aerial vehicles. In this paper, we propose a large-scale dataset with 500 aerial videos and manually annotated primary objects. To the best of our knowledge, it is the largest dataset to date for primary object segmentation in aerial videos. From this dataset, we find most aerial videos contain large-scale scenes, small primary objects as well as consistently varying scales and viewpoints. Inspired by that, we propose a hierarchical deep co-segmentation approach that repeatedly divides a video into two sub-videos formed by the odd and even frames, respectively. In this manner, the primary objects shared by sub-videos can be co-segmented by training two-stream CNNs and finally refined within the neighborhood reversible flows. Experimental results show that our approach remarkably outperforms 17 state-of-the-art methods in segmenting primary objects in various types of aerial videos.