Abstract:Accurately estimating the direction-of-arrival (DOA) of a speech source using a compact microphone array (CMA) is often complicated by background noise and reverberation. A commonly used DOA estimation method is the steered response power with phase transform (SRP-PHAT) function, which has been shown to work reliably in moderate levels of noise and reverberation. Since for closely spaced microphones the spatial coherence of noise and reverberation may be high over an extended frequency range, this may negatively affect the SRP-PHAT spectra, resulting in DOA estimation errors. Assuming the availability of an auxiliary microphone at an unknown position which is spatially separated from the CMA, in this paper we propose to compute the SRP-PHAT spectra between the microphones of the CMA based on the SRP-PHAT spectra between the auxiliary microphone and the microphones of the CMA. For different levels of noise and reverberation, we show how far the auxiliary microphone needs to be spatially separated from the CMA for the auxiliary microphone-based SRP-PHAT spectra to be more reliable than the SRP-PHAT spectra without the auxiliary microphone. These findings are validated based on simulated microphone signals for several auxiliary microphone positions and two different noise and reverberation conditions.
Abstract:A popular approach for 3D source localization using multiple microphones is the steered-response power method, where the source position is directly estimated by maximizing a function of three continuous position variables. Instead of directly estimating the source position, in this paper we propose an indirect, distance-based method for 3D source localization. Based on properties of Euclidean distance matrices (EDMs), we reformulate the 3D source localization problem as the minimization of a cost function of a single variable, namely the distance between the source and the reference microphone. Using the known microphone geometry and estimated time-differences of arrival (TDOAs) between the microphones, we show how the 3D source position can be computed based on this variable. In addition, instead of using a single TDOA estimate per microphone pair, we propose an extension that enables to select the most appropriate estimate from a set of candidate TDOA estimates, which is especially relevant in reverberant environments with strong early reflections. Experimental results for different source and microphone constellations show that the proposed EDM-based method consistently outperforms the steered-response power method, especially when the source is close to the microphones.