In education and intervention programs, person's engagement has been identified as a major factor in successful program completion. Automatic measurement of person's engagement provides useful information for instructors to meet program objectives and individualize program delivery. In this paper, we present a novel approach for video-based engagement measurement in virtual learning programs. We propose to use affect states, continuous values of valence and arousal extracted from consecutive video frames, along with a new latent affective feature vector and behavioral features for engagement measurement. Deep learning-based temporal, and traditional machine-learning-based non-temporal models are trained and validated on frame-level, and video-level features, respectively. In addition to the conventional centralized learning, we also implement the proposed method in a decentralized federated learning setting and study the effect of model personalization in engagement measurement. We evaluated the performance of the proposed method on the only two publicly available video engagement measurement datasets, DAiSEE and EmotiW, containing videos of students in online learning programs. Our experiments show a state-of-the-art engagement level classification accuracy of 63.3% and correctly classifying disengagement videos in the DAiSEE dataset and a regression mean squared error of 0.0673 on the EmotiW dataset. Our ablation study shows the effectiveness of incorporating affect states in engagement measurement. We interpret the findings from the experimental results based on psychology concepts in the field of engagement.