Sparse representation-based classification (SRC) has been shown to achieve a high level of accuracy in face recognition (FR). However, matching faces captured in unconstrained video against a gallery with a single reference facial still per individual typically yields low accuracy. For improved robustness to intra-class variations, SRC techniques for FR have recently been extended to incorporate variational information from an external generic set into an auxiliary dictionary. Despite their success in handling linear variations, non-linear variations (e.g., pose and expressions) between probe and reference facial images cannot be accurately reconstructed with a linear combination of images in the gallery and auxiliary dictionaries because they do not share the same type of variations. In order to account for non-linear variations due to pose, a paired sparse representation model is introduced allowing for joint use of variational information and synthetic face images. The proposed model, called synthetic plus variational model, reconstructs a probe image by jointly using (1) a variational dictionary and (2) a gallery dictionary augmented with a set of synthetic images generated over a wide diversity of pose angles. The augmented gallery dictionary is then encouraged to pair the same sparsity pattern with the variational dictionary for similar pose angles by solving a newly formulated simultaneous sparsity-based optimization problem. Experimental results obtained on Chokepoint and COX-S2V datasets, using different face representations, indicate that the proposed approach can outperform state-of-the-art SRC-based methods for still-to-video FR with a single sample per person.