Abstract:Traditional psychological evaluations rely heavily on human observation and interpretation, which are prone to subjectivity, bias, fatigue, and inconsistency. To address these limitations, this work presents a multimodal emotion recognition system that provides a standardised, objective, and data-driven tool to support evaluators, such as psychologists, psychiatrists, and clinicians. The system integrates recognition of facial expressions, speech, spoken language, and body movement analysis to capture subtle emotional cues that are often overlooked in human evaluations. By combining these modalities, the system provides more robust and comprehensive emotional state assessment, reducing the risk of mis- and overdiagnosis. Preliminary testing in a simulated real-world condition demonstrates the system's potential to provide reliable emotional insights to improve the diagnostic accuracy. This work highlights the promise of automated multimodal analysis as a valuable complement to traditional psychological evaluation practices, with applications in clinical and therapeutic settings.