Motion transfer is the task of synthesizing future video frames of a single source image according to the motion from a given driving video. In order to solve it, we face the challenging complexity of motion representation and the unknown relations between the driving video and the source image. Despite its difficulty, this problem attracted great interests from researches at the recent years, with gradual improvements. The goal is often thought as the decoupling of motion and appearance, which is may be solved by extracting the motion from keypoint movement. We chose to tackle the generic, unsupervised setting, where we need to apply animation to any arbitrary object, without any domain specific model for the structure of the input. In this work, we extract the structure from a keypoint heatmap, without an explicit motion representation. Then, the structures from the image and the video are extracted to warp the image according to the video, by a deep generator. We suggest two variants of the structure from different steps in the keypoint module, and show superior qualitative pose and quantitative scores.