Abstract:We present a novel simple yet effective algorithm for motion-based video frame interpolation. Existing motion-based interpolation methods typically rely on a pre-trained optical flow model or a U-Net based pyramid network for motion estimation, which either suffer from large model size or limited capacity in handling complex and large motion cases. In this work, by carefully integrating intermediateoriented forward-warping, lightweight feature encoder, and correlation volume into a pyramid recurrent framework, we derive a compact model to simultaneously estimate the bidirectional motion between input frames. It is 15 times smaller in size than PWC-Net, yet enables more reliable and flexible handling of challenging motion cases. Based on estimated bi-directional motion, we forward-warp input frames and their context features to intermediate frame, and employ a synthesis network to estimate the intermediate frame from warped representations. Our method achieves excellent performance on a broad range of video frame interpolation benchmarks. Code will be available soon.
Abstract:In the era of Internet, recognizing pornographic images is of great significance for protecting children's physical and mental health. However, this task is very challenging as the key pornographic contents (e.g., breast and private part) in an image often lie in local regions of small size. In this paper, we model each image as a bag of regions, and follow a multiple instance learning (MIL) approach to train a generic region-based recognition model. Specifically, we take into account the region's degree of pornography, and make three main contributions. First, we show that based on very few annotations of the key pornographic contents in a training image, we can generate a bag of properly sized regions, among which the potential positive regions usually contain useful contexts that can aid recognition. Second, we present a simple quantitative measure of a region's degree of pornography, which can be used to weigh the importance of different regions in a positive image. Third, we formulate the recognition task as a weighted MIL problem under the convolutional neural network framework, with a bag probability function introduced to combine the importance of different regions. Experiments on our newly collected large scale dataset demonstrate the effectiveness of the proposed method, achieving an accuracy with 97.52% true positive rate at 1% false positive rate, tested on 100K pornographic images and 100K normal images.