Abstract:Speech Emotion Recognition (SER) is of great importance in Human-Computer Interaction (HCI), as it provides a deeper understanding of the situation and results in better interaction. In recent years, various machine learning and deep learning algorithms have been developed to improve SER techniques. Recognition of emotions depends on the type of expression that varies between different languages. In this article, to further study this important factor in Farsi, we examine various deep learning techniques on the SheEMO dataset. Using signal features in low- and high-level descriptions and different deep networks and machine learning techniques, Unweighted Average Recall (UAR) of 65.20 is achieved with an accuracy of 78.29.