Abstract:In this paper, we advance a recently-proposed uncertainty decoding scheme for DNN-HMM (deep neural network - hidden Markov model) hybrid systems. This numerical sampling concept averages DNN outputs produced by a finite set of feature samples (drawn from a probabilistic distortion model) to approximate the posterior likelihoods of the context-dependent HMM states. As main innovation, we propose a weighted DNN-output averaging based on a minimum classification error criterion and apply it to a probabilistic distortion model for spatial diffuseness features. The experimental evaluation is performed on the 8-channel REVERB Challenge task using a DNN-HMM hybrid system with multichannel front-end signal enhancement. We show that the recognition accuracy of the DNN-HMM hybrid system improves by incorporating uncertainty decoding based on random sampling and that the proposed weighted DNN-output averaging further reduces the word error rate scores.