Abstract:Translation-based Video Synthesis (TVS) has emerged as a vital research area in computer vision, aiming to facilitate the transformation of videos between distinct domains while preserving both temporal continuity and underlying content features. This technique has found wide-ranging applications, encompassing video super-resolution, colorization, segmentation, and more, by extending the capabilities of traditional image-to-image translation to the temporal domain. One of the principal challenges faced in TVS is the inherent risk of introducing flickering artifacts and inconsistencies between frames during the synthesis process. This is particularly challenging due to the necessity of ensuring smooth and coherent transitions between video frames. Efforts to tackle this challenge have induced the creation of diverse strategies and algorithms aimed at mitigating these unwanted consequences. This comprehensive review extensively examines the latest progress in the realm of TVS. It thoroughly investigates emerging methodologies, shedding light on the fundamental concepts and mechanisms utilized for proficient video synthesis. This survey also illuminates their inherent strengths, limitations, appropriate applications, and potential avenues for future development.