Abstract:Disentangling facial movements from other facial characteristics, particularly from facial identity, remains a challenging task, as facial movements display great variation between individuals. In this paper, we aim to characterize individual-specific facial movements. We present a novel training approach to learn facial movements independently of other facial characteristics, focusing on each individual separately. We propose self-supervised Person-Specific Models (PSMs), in which one model per individual can learn to extract an embedding of the facial movements independently of the person's identity and other structural facial characteristics from unlabeled facial video. These models are trained using encoder-decoder-like architectures. We provide quantitative and qualitative evidence that a PSM learns a meaningful facial embedding that discovers fine-grained movements otherwise not characterized by a General Model (GM), which is trained across individuals and characterizes general patterns of facial movements. We present quantitative and qualitative evidence that this approach is easily scalable and generalizable for new individuals: facial movements knowledge learned on a person can quickly and effectively be transferred to a new person. Lastly, we propose a novel PSM using curriculum temporal learning to leverage the temporal contiguity between video frames. Our code, analysis details, and all pretrained models are available in Github and Supplementary Materials.