Abstract:We present a novel end-to-end identity-agnostic face reenactment system, MaskRenderer, that can generate realistic, high fidelity frames in real-time. Although recent face reenactment works have shown promising results, there are still significant challenges such as identity leakage and imitating mouth movements, especially for large pose changes and occluded faces. MaskRenderer tackles these problems by using (i) a 3DMM to model 3D face structure to better handle pose changes, occlusion, and mouth movements compared to 2D representations; (ii) a triplet loss function to embed the cross-reenactment during training for better identity preservation; and (iii) multi-scale occlusion, improving inpainting and restoring missing areas. Comprehensive quantitative and qualitative experiments conducted on the VoxCeleb1 test set, demonstrate that MaskRenderer outperforms state-of-the-art models on unseen faces, especially when the Source and Driving identities are very different.