Abstract:Emotion prediction is a key emerging research area that focuses on identifying and forecasting the emotional state of a human from multiple modalities. Among other data sources, physiological data can serve as an indicator for emotions with an added advantage that it cannot be masked/tampered by the individual and can be easily collected. This paper surveys multiple machine learning methods that deploy smartphone and physiological data to predict emotions in real-time, using self-reported ecological momentary assessments (EMA) scores as ground-truth. Comparing regression, long short-term memory (LSTM) networks, convolutional neural networks (CNN), reinforcement online learning (ROL), and deep belief networks (DBN), we showcase the variability of machine learning methods employed to achieve accurate emotion prediction. We compare the state-of-the-art methods and highlight that experimental performance is still not very good. The performance can be improved in future works by considering the following issues: improving scalability and generalizability, synchronizing multimodal data, optimizing EMA sampling, integrating adaptability with sequence prediction, collecting unbiased data, and leveraging sophisticated feature engineering techniques.