Abstract:With the emerging diffusion models, recently, text-to-video generation has aroused increasing attention. But an important bottleneck therein is that generative videos often tend to carry some flickers and artifacts. In this work, we propose a dual-stream diffusion net (DSDN) to improve the consistency of content variations in generating videos. In particular, the designed two diffusion streams, video content and motion branches, could not only run separately in their private spaces for producing personalized video variations as well as content, but also be well-aligned between the content and motion domains through leveraging our designed cross-transformer interaction module, which would benefit the smoothness of generated videos. Besides, we also introduce motion decomposer and combiner to faciliate the operation on video motion. Qualitative and quantitative experiments demonstrate that our method could produce amazing continuous videos with fewer flickers.
Abstract:The security of the Person Re-identification(ReID) model plays a decisive role in the application of ReID. However, deep neural networks have been shown to be vulnerable, and adding undetectable adversarial perturbations to clean images can trick deep neural networks that perform well in clean images. We propose a ReID multi-modal data augmentation method with adversarial defense effect: 1) Grayscale Patch Replacement, it consists of Local Grayscale Patch Replacement(LGPR) and Global Grayscale Patch Replacement(GGPR). This method can not only improve the accuracy of the model, but also help the model defend against adversarial examples; 2) Multi-Modal Defense, it integrates three homogeneous modal images of visible, grayscale and sketch, and further strengthens the defense ability of the model. These methods fuse different modalities of homogeneous images to enrich the input sample variety, the variaty of samples will reduce the over-fitting of the ReID model to color variations and make the adversarial space of the dataset that the attack method can find difficult to align, thus the accuracy of model is improved, and the attack effect is greatly reduced. The more modal homogeneous images are fused, the stronger the defense capabilities is . The proposed method performs well on multiple datasets, and successfully defends the attack of MS-SSIM proposed by CVPR2020 against ReID [10], and increases the accuracy by 467 times(0.2% to 93.3%).The code is available at https://github.com/finger-monkey/ReID_Adversarial_Defense.
Abstract:In order to make full use of structural information of grayscale images and reduce adverse impact of illumination variation for person re-identification (ReID), an effective data augmentation method is proposed in this paper, which includes Random Grayscale Transformation, Random Grayscale Patch Replacement and their combination. It is discovered that structural information has a significant effect on the ReID model performance, and it is very important complementary to RGB images ReID. During ReID model training, on the one hand, we randomly selected a rectangular area in the RGB image and replace its color with the same rectangular area grayscale in corresponding grayscale image, thus we generate a training image with different grayscale areas; On the other hand, we convert an image into a grayscale image. These two methods will reduce the risk of overfitting the model due to illumination variations and make the model more robust to cross-camera. The experimental results show that our method achieves a performance improvement of up to 3.3%, achieving the highest retrieval accuracy currently on multiple datasets.