Abstract:Major depression, also known as clinical depression, is a constant sense of despair and hopelessness. It is a major mental disorder that can affect people of any age including children and that affect negatively person's personal life, work life, social life and health conditions. Globally, over 300 million people of all ages are estimated to suffer from clinical depression. A deep recurrent neural network-based framework is presented in this paper to detect depression and to predict its severity level from speech. Low-level and high-level audio features are extracted from audio recordings to predict the 24 scores of the Patient Health Questionnaire (a depression assessment test) and the binary class of depression diagnosis. To overcome the problem of the small size of Speech Depression Recognition (SDR) datasets, data augmentation techniques are used to expand the labeled training set and also transfer learning is performed where the proposed model is trained on a related task and reused as starting point for the proposed model on SDR task. The proposed framework is evaluated on the DAIC-WOZ corpus of the AVEC2017 challenge and promising results are obtained. An overall accuracy of 76.27\% with a root mean square error of 0.4 is achieved in assessing depression, while a root mean square error of 0.168 is achieved in predicting the depression severity levels.