Abstract:In many multi-microphone algorithms for noise reduction, an estimate of the relative transfer function (RTF) vector of the target speaker is required. The state-of-the-art covariance whitening (CW) method estimates the RTF vector as the principal eigenvector of the whitened noisy covariance matrix, where whitening is performed using an estimate of the noise covariance matrix. In this paper, we consider an acoustic sensor network consisting of multiple microphone nodes. Assuming uncorrelated noise between the nodes but not within the nodes, we propose two RTF vector estimation methods that leverage the block-diagonal structure of the noise covariance matrix. The first method modifies the CW method by considering only the diagonal blocks of the estimated noise covariance matrix. In contrast, the second method only considers the off-diagonal blocks of the noisy covariance matrix, but cannot be solved using a simple eigenvalue decomposition. When applying the estimated RTF vector in a minimum variance distortionless response beamformer, simulation results for real-world recordings in a reverberant environment with multiple noise sources show that the modified CW method performs slightly better than the CW method in terms of SNR improvement, while the off-diagonal selection method outperforms a biased RTF vector estimate obtained as the principal eigenvector of the noisy covariance matrix.
Abstract:This paper addresses the challenge of estimating the relative transfer function (RTF) vectors of multiple speakers in a noisy and reverberant environment. More specifically, we consider a scenario where two speakers activate successively. In this scenario, the RTF vector of the first speaker can be estimated in a straightforward way and the main challenge lies in estimating the RTF vector of the second speaker during segments where both speakers are simultaneously active. To estimate the RTF vector of the second speaker the so-called blind oblique projection (BOP) method determines the oblique projection operator that optimally blocks the second speaker. Instead of blocking the second speaker, in this paper we propose a covariance blocking and whitening (CBW) method, which first blocks the first speaker and applies whitening using the estimated noise covariance matrix and then estimates the RTF vector of the second speaker based on a singular value decomposition. When using the estimated RTF vectors of both speakers in a linearly constrained minimum variance beamformer, simulation results using real-world recordings for multiple speaker positions demonstrate that the proposed CBW method outperforms the conventional BOP and covariance whitening methods in terms of signal-to-interferer-and-noise ratio improvement.
Abstract:Interfering sources, background noise and reverberation degrade speech quality and intelligibility in hearing aid applications. In this paper, we present an adaptive algorithm aiming at dereverberation, noise and interferer reduction and preservation of binaural cues based on the wBLCMP beamformer. The wBLCMP beamformer unifies the multi-channel weighted prediction error method performing dereverberation and the linearly constrained minimum power beamformer performing noise and interferer reduction into a single convolutional beamformer. We propose to adaptively compute the optimal filter by incorporating an exponential window into a sparsity-promoting lp-norm cost function, which enables to track a moving target speaker. Simulation results with successive target speakers at different positions show that the proposed adaptive version of the wBLCMP beamformer outperforms a non-adaptive version in terms of objective speech enhancement performance measures.
Abstract:Recently, the convolutional weighted power minimization distortionless response (WPD) beamformer was proposed, which unifies multi-channel weighted prediction error dereverberation and minimum power distortionless response beamforming. To optimize the convolutional filter, the desired speech component is modeled with a time-varying Gaussian model, which promotes the sparsity of the desired speech component in the short-time Fourier transform domain compared to the noisy microphone signals. In this paper we generalize the convolutional WPD beamformer by using an lp-norm cost function, introducing an adjustable shape parameter which enables to control the sparsity of the desired speech component. Experiments based on the REVERB challenge dataset show that the proposed method outperforms the conventional convolutional WPD beamformer in terms of objective speech quality metrics.