Abstract:With the development of Generative Adversarial Network, image-based virtual try-on methods have made great progress. However, limited work has explored the task of video-based virtual try-on while it is important in real-world applications. Most existing video-based virtual try-on methods usually require clothing templates and they can only generate blurred and low-resolution results. To address these challenges, we propose a Memory-based Video virtual Try-On Network (MV-TON), which seamlessly transfers desired clothes to a target person without using any clothing templates and generates high-resolution realistic videos. Specifically, MV-TON consists of two modules: 1) a try-on module that transfers the desired clothes from model images to frame images by pose alignment and region-wise replacing of pixels; 2) a memory refinement module that learns to embed the existing generated frames into the latent space as external memory for the following frame generation. Experimental results show the effectiveness of our method in the video virtual try-on task and its superiority over other existing methods.