Abstract:This work identifies and advances some research challenges in the analysis of facial features and their temporal dynamics with epistemic mental states in dyadic conversations. Epistemic states are: Agreement, Concentration, Thoughtful, Certain, and Interest. In this paper, we perform a number of statistical analyses and simulations to identify the relationship between facial features and epistemic states. Non-linear relations are found to be more prevalent, while temporal features derived from original facial features have demonstrated a strong correlation with intensity changes. Then, we propose a novel prediction framework that takes facial features and their nonlinear relation scores as input and predict different epistemic states in videos. The prediction of epistemic states is boosted when the classification of emotion changing regions such as rising, falling, or steady-state are incorporated with the temporal features. The proposed predictive models can predict the epistemic states with significantly improved accuracy: correlation coefficient (CoERR) for Agreement is 0.827, for Concentration 0.901, for Thoughtful 0.794, for Certain 0.854, and for Interest 0.913.