Abstract:Generating realistic animated videos from static images is an important area of research in computer vision. Methods based on physical simulation and motion prediction have achieved notable advances, but they are often limited to specific object textures and motion trajectories, failing to exhibit highly complex environments and physical dynamics. In this paper, we introduce an open-domain controllable image animation method using motion priors with video diffusion models. Our method achieves precise control over the direction and speed of motion in the movable region by extracting the motion field information from videos and learning moving trajectories and strengths. Current pretrained video generation models are typically limited to producing very short videos, typically less than 30 frames. In contrast, we propose an efficient long-duration video generation method based on noise reschedule specifically tailored for image animation tasks, facilitating the creation of videos over 100 frames in length while maintaining consistency in content scenery and motion coordination. Specifically, we decompose the denoise process into two distinct phases: the shaping of scene contours and the refining of motion details. Then we reschedule the noise to control the generated frame sequences maintaining long-distance noise correlation. We conducted extensive experiments with 10 baselines, encompassing both commercial tools and academic methodologies, which demonstrate the superiority of our method. Our project page: https://wangqiang9.github.io/Controllable.github.io/
Abstract:One of the challenges of logo recognition lies in the diversity of forms, such as symbols, texts or a combination of both; further, logos tend to be extremely concise in design while similar in appearance, suggesting the difficulty of learning discriminative representations. To investigate the variety and representation of logo, we introduced Makeup216, the largest and most complex logo dataset in the field of makeup, captured from the real world. It comprises of 216 logos and 157 brands, including 10,019 images and 37,018 annotated logo objects. In addition, we found that the marginal background around the pure logo can provide a important context information and proposed an adversarial attention representation framework (AAR) to attend on the logo subject and auxiliary marginal background separately, which can be combined for better representation. Our proposed framework achieved competitive results on Makeup216 and another large-scale open logo dataset, which could provide fresh thinking for logo recognition. The dataset of Makeup216 and the code of the proposed framework will be released soon.