Abstract:Chinese traditional painting is one of the most historical artworks in the world. It is very popular in Eastern and Southeast Asia due to being aesthetically appealing. Compared with western artistic painting, it is usually more visually abstract and textureless. Recently, neural network based style transfer methods have shown promising and appealing results which are mainly focused on western painting. It remains a challenging problem to preserve abstraction in neural style transfer. In this paper, we present a Neural Abstract Style Transfer method for Chinese traditional painting. It learns to preserve abstraction and other style jointly end-to-end via a novel MXDoG-guided filter (Modified version of the eXtended Difference-of-Gaussians) and three fully differentiable loss terms. To the best of our knowledge, there is little work study on neural style transfer of Chinese traditional painting. To promote research on this direction, we collect a new dataset with diverse photo-realistic images and Chinese traditional paintings. In experiments, the proposed method shows more appealing stylized results in transferring the style of Chinese traditional painting than state-of-the-art neural style transfer methods.
Abstract:Region-based convolutional neural networks (R-CNN)~\cite{fast_rcnn,faster_rcnn,mask_rcnn} have largely dominated object detection. Operators defined on RoIs (Region of Interests) play an important role in R-CNNs such as RoIPooling~\cite{fast_rcnn} and RoIAlign~\cite{mask_rcnn}. They all only utilize information inside RoIs for RoI prediction, even with their recent deformable extensions~\cite{deformable_cnn}. Although surrounding context is well-known for its importance in object detection, it has yet been integrated in R-CNNs in a flexible and effective way. Inspired by the auto-context work~\cite{auto_context} and the multi-class object layout work~\cite{nms_context}, this paper presents a generic context-mining RoI operator (i.e., \textit{RoICtxMining}) seamlessly integrated in R-CNNs, and the resulting object detection system is termed \textbf{Auto-Context R-CNN} which is trained end-to-end. The proposed RoICtxMining operator is a simple yet effective two-layer extension of the RoIPooling or RoIAlign operator. Centered at an object-RoI, it creates a $3\times 3$ layout to mine contextual information adaptively in the $8$ surrounding context regions on-the-fly. Within each of the $8$ context regions, a context-RoI is mined in term of discriminative power and its RoIPooling / RoIAlign features are concatenated with the object-RoI for final prediction. \textit{The proposed Auto-Context R-CNN is robust to occlusion and small objects, and shows promising vulnerability for adversarial attacks without being adversarially-trained.} In experiments, it is evaluated using RoIPooling as the backbone and shows competitive results on Pascal VOC, Microsoft COCO, and KITTI datasets (including $6.9\%$ mAP improvements over the R-FCN~\cite{rfcn} method on COCO \textit{test-dev} dataset and the first place on both KITTI pedestrian and cyclist detection as of this submission).
Abstract:Jointly integrating aspect ratio and context has been extensively studied and shown performance improvement in traditional object detection systems such as the DPMs. It, however, has been largely ignored in deep neural network based detection systems. This paper presents a method of integrating a mixture of object models and region-based convolutional networks for accurate object detection. Each mixture component accounts for both object aspect ratio and multi-scale contextual information explicitly: (i) it exploits a mixture of tiling configurations in the RoI pooling to remedy the warping artifacts caused by a single type RoI pooling (e.g., with equally-sized 7 x 7 cells), and to respect the underlying object shapes more; (ii) it "looks from both the inside and the outside of a RoI" by incorporating contextual information at two scales: global context pooled from the whole image and local context pooled from the surrounding of a RoI. To facilitate accurate detection, this paper proposes a multi-stage detection scheme for integrating the mixture of object models, which utilizes the detection results of the model at the previous stage as the proposals for the current in both training and testing. The proposed method is called the aspect ratio and context aware region-based convolutional network (ARC-R-CNN). In experiments, ARC-R-CNN shows very competitive results with Faster R-CNN [41] and R-FCN [10] on two datasets: the PASCAL VOC and the Microsoft COCO. It obtains significantly better mAP performance using high IoU thresholds on both datasets.