Abstract:Feature subspace selection is an important part in speech emotion recognition. Most of the studies are devoted to finding a feature subspace for representing all emotions. However, some studies have indicated that the features associated with different emotions are not exactly the same. Hence, traditional methods may fail to distinguish some of the emotions with just one global feature subspace. In this work, we propose a new divide and conquer idea to solve the problem. First, the feature subspaces are constructed for all the combinations of every two different emotions (emotion-pair). Bi-classifiers are then trained on these feature subspaces respectively. The final emotion recognition result is derived by the voting and competition method. Experimental results demonstrate that the proposed method can get better results than the traditional multi-classification method.
Abstract:In this paper, we first present a new variant of Gaussian restricted Boltzmann machine (GRBM) called multivariate Gaussian restricted Boltzmann machine (MGRBM), with its definition and learning algorithm. Then we propose using a learned GRBM or MGRBM to extract better features for robust speech recognition. Our experiments on Aurora2 show that both GRBM-extracted and MGRBM-extracted feature performs much better than Mel-frequency cepstral coefficient (MFCC) with either HMM-GMM or hybrid HMM-deep neural network (DNN) acoustic model, and MGRBM-extracted feature is slightly better.