Abstract:We consider speech enhancement for signals picked up in one noisy environment that must be rendered to a listener in another noisy environment. For both far-end noise reduction and near-end listening enhancement, it has been shown that excessive focus on noise suppression or intelligibility maximization may lead to excessive speech distortions and quality degradations in favorable noise conditions, where intelligibility is already at ceiling level. Recently [1,2] propose to remedy this with a minimum processing framework that either reduces noise or enhances listening a minimum amount given that a certain intelligibility criterion is still satisfied. Additionally, it has been shown that joint consideration of both environments improves speech enhancement performance. In this paper, we formulate a joint far- and near-end minimum processing framework, that improves intelligibility while limiting speech distortions in favorable noise conditions. We provide closed-form solutions to specific boundary scenarios and investigate performance for the general case using numerical optimization. We also show that concatenating existing minimum processing far- and near-end enhancement methods preserves the effects of the initial methods. Results show that the joint optimization can further improve performance compared to the concatenated approach.
Abstract:The intelligibility and quality of speech from a mobile phone or public announcement system are often affected by background noise in the listening environment. By pre-processing the speech signal it is possible to improve the speech intelligibility and quality -- this is known as near-end listening enhancement (NLE). Although, existing NLE techniques are able to greatly increase intelligibility in harsh noise environments, in favorable noise conditions the intelligibility of speech reaches a ceiling where it cannot be further enhanced. Actually, the focus of existing methods solely on improving the intelligibility causes unnecessary processing of the speech signal and leads to speech distortions and quality degradations. In this paper, we provide a new rationale for NLE, where the target speech is minimally processed in terms of a processing penalty, provided that a certain performance constraint, e.g., intelligibility, is satisfied. We present a closed-form solution for the case where the performance criterion is an intelligibility estimator based on the approximated speech intelligibility index and the processing penalty is the mean-square error between the processed and the clean speech. This produces an NLE method that adapts to changing noise conditions via a simple gain rule by limiting the processing to the minimum necessary to achieve a desired intelligibility, while at the same time focusing on quality in favorable noise situations by minimizing the amount of speech distortions. Through simulation studies, we show the proposed method attains speech quality on par or better than existing methods in both objective measurements and subjective listening tests, whilst still sustaining objective speech intelligibility performance on par with existing methods.