Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Topic:Microphone Array

Incremental Averaging Method to Improve Graph-Based Time-Difference-of-Arrival Estimation

Jul 09, 2025

Klaus Brümann, Kouei Yamaoka, Nobutaka Ono, Simon Doclo

Abstract:Estimating the position of a speech source based on time-differences-of-arrival (TDOAs) is often adversely affected by background noise and reverberation. A popular method to estimate the TDOA between a microphone pair involves maximizing a generalized cross-correlation with phase transform (GCC-PHAT) function. Since the TDOAs across different microphone pairs satisfy consistency relations, generally only a small subset of microphone pairs are used for source position estimation. Although the set of microphone pairs is often determined based on a reference microphone, recently a more robust method has been proposed to determine the set of microphone pairs by computing the minimum spanning tree (MST) of a signal graph of GCC-PHAT function reliabilities. To reduce the influence of noise and reverberation on the TDOA estimation accuracy, in this paper we propose to compute the GCC-PHAT functions of the MST based on an average of multiple cross-power spectral densities (CPSDs) using an incremental method. In each step of the method, we increase the number of CPSDs over which we average by considering CPSDs computed indirectly via other microphones from previous steps. Using signals recorded in a noisy and reverberant laboratory with an array of spatially distributed microphones, the performance of the proposed method is evaluated in terms of TDOA estimation error and 2D source position estimation error. Experimental results for different source and microphone configurations and three reverberation conditions show that the proposed method considering multiple CPSDs improves the TDOA estimation and source position estimation accuracy compared to the reference microphone- and MST-based methods that rely on a single CPSD as well as steered-response power-based source position estimation.

Via

Access Paper or Ask Questions

Analysis of Null Related Beampattern Measures and Signal Quantization Effects for Linear Differential Microphone Arrays

Jun 26, 2025

Shweta Pal, Arun Kumar, Monika Agrawal

Abstract:A differential microphone array (DMA) offers enhanced capabilities to obtain sharp nulls at the cost of relatively broad peaks in the beam power pattern. This can be used for applications that require nullification or attenuation of interfering sources. To the best of our knowledge, the existing literature lacks measures that directly assess the efficacy of nulls, and null-related measures have not been investigated in the context of differential microphone arrays (DMAs). This paper offers new insights about the utility of DMAs by proposing measures that characterize the nulls in their beam power patterns. We investigate the performance of differential beamformers by presenting and evaluating null-related measures namely null depth (ND) and Null Width (NW) as a function of depth level relative to the beam power pattern maxima. A study of signal quantization effects due to data acquisition for 1st, 2nd and 3rd order linear DMAs and for different beampatterns i.e. dipole, cardioid, hypercardioid and supercardioid is presented. An analytical expression for the quantized beamformed output for any general $ N^{th} $ order DMA is formulated. Simulation results of the variation of ND with number of quantization bits and the variation of NW as a function of depth are also presented and inferences are drawn. Lab experiments are conducted in a fully anechoic room to support the simulation results. The measured beampattern exhibits a pronounced null depth, confirming the effectiveness of the experimental setup.

* 10 pages, 15 Figures, 3 Tables

Via

Access Paper or Ask Questions

A Wireless Self-Calibrating Ultrasound Microphone Array with Sub-Microsecond Synchronization

Jun 24, 2025

Dennis Laurijssen, Rens Baeyens, Walter Daems, Jan Steckel

Abstract:We present a novel system architecture for a distributed wireless, self-calibrating ultrasound microphone network for synchronized in-air acoustic sensing. Once deployed the embedded nodes determine their position in the environment using the infrared optical tracking system found in the HTC Vive Lighthouses. After self-calibration, the nodes start sampling the ultrasound microphone while embedding a synchronization signal in the data which is established using a wireless Sub-1GHz RF link. Data transmission is handled via the Wi-Fi 6 radio that is embedded in the nodes' SoC, decoupling synchronization from payload transport. A prototype system with a limited amount of network nodes was used to verify the proposed distributed microphone array's wireless data acquisition and synchronization capabilities. This architecture lays the groundwork for scalable, deployable ultrasound arrays for sound source localization applications in bio-acoustic research and industrial acoustic monitoring.

Via

Access Paper or Ask Questions

Effect of Signal Quantization on Performance Measures of a 1st Order One Dimensional Differential Microphone Array

Jun 18, 2025

Shweta Pal, Arun Kumar, Monika Agrawal

Abstract:In practical systems, recorded analog signals must be digitized for processing, introducing quantization as a critical aspect of data acquisition. While prior studies have examined quantization effects in various signal processing contexts, its impact on differential microphone arrays (DMAs), particularly in one-dimensional (1D) first-order configurations, remains unexplored. This paper investigates the influence of signal quantization on performance of first-order 1D DMAs across various beampatterns. An analytical expression for quantized beamformed output for a first-order 1D DMA has been formulated. The effect of signal quantization has been studied on array performance measures such as the Beampattern, Directivity Factor (DF), Front-to-Back Ratio (FBR), and null depth (ND). Simulation results reveal that beampattern shape remains structurally invariant across quantization bit depths, with quantization primarily affecting ND. DF and FBR remain constant with the varying number of quantization bits. Additionally, ND is shown to be frequency-independent; however, it increases with increasing quantization bit depths, enhancing interference suppression. The study also examines the effect of steering nulls across the azimuthal range, showing that ND degrades as the null moves closer to the source look direction, indicating reduced interference suppression.

* 5 Pages with 6 figures and 1 table

Via

Access Paper or Ask Questions

Thinking in Directivity: Speech Large Language Model for Multi-Talker Directional Speech Recognition

Jun 17, 2025

Jiamin Xie, Ju Lin, Yiteng Huang, Tyler Vuong, Zhaojiang Lin, Zhaojun Yang, Peng Su, Prashant Rawat, Sangeeta Srivastava, Ming Sun(+1 more)

Abstract:Recent studies have demonstrated that prompting large language models (LLM) with audio encodings enables effective speech recognition capabilities. However, the ability of Speech LLMs to comprehend and process multi-channel audio with spatial cues remains a relatively uninvestigated area of research. In this work, we present directional-SpeechLlama, a novel approach that leverages the microphone array of smart glasses to achieve directional speech recognition, source localization, and bystander cross-talk suppression. To enhance the model's ability to understand directivity, we propose two key techniques: serialized directional output training (S-DOT) and contrastive direction data augmentation (CDDA). Experimental results show that our proposed directional-SpeechLlama effectively captures the relationship between textual cues and spatial audio, yielding strong performance in both speech recognition and source localization tasks.

* Accepted to Interspeech 2025

Via

Access Paper or Ask Questions

Observability-Aware Active Calibration of Multi-Sensor Extrinsics for Ground Robots via Online Trajectory Optimization

Jun 16, 2025

Jiang Wang, Yaozhong Kang, Linya Fu, Kazuhiro Nakadai, He Kong

Abstract:Accurate calibration of sensor extrinsic parameters for ground robotic systems (i.e., relative poses) is crucial for ensuring spatial alignment and achieving high-performance perception. However, existing calibration methods typically require complex and often human-operated processes to collect data. Moreover, most frameworks neglect acoustic sensors, thereby limiting the associated systems' auditory perception capabilities. To alleviate these issues, we propose an observability-aware active calibration method for ground robots with multimodal sensors, including a microphone array, a LiDAR (exteroceptive sensors), and wheel encoders (proprioceptive sensors). Unlike traditional approaches, our method enables active trajectory optimization for online data collection and calibration, contributing to the development of more intelligent robotic systems. Specifically, we leverage the Fisher information matrix (FIM) to quantify parameter observability and adopt its minimum eigenvalue as an optimization metric for trajectory generation via B-spline curves. Through planning and replanning of robot trajectory online, the method enhances the observability of multi-sensor extrinsic parameters. The effectiveness and advantages of our method have been demonstrated through numerical simulations and real-world experiments. For the benefit of the community, we have also open-sourced our code and data at https://github.com/AISLAB-sustech/Multisensor-Calibration.

* Accepted and to appear in the IEEE Sensors Journal

Via

Access Paper or Ask Questions

Lightweight and Robust Multi-Channel End-to-End Speech Recognition with Spherical Harmonic Transform

Jun 13, 2025

Xiangzhu Kong, Huang Hao, Zhijian Ou

Abstract:This paper presents SHTNet, a lightweight spherical harmonic transform (SHT) based framework, which is designed to address cross-array generalization challenges in multi-channel automatic speech recognition (ASR) through three key innovations. First, SHT based spatial sound field decomposition converts microphone signals into geometry-invariant spherical harmonic coefficients, isolating signal processing from array geometry. Second, the Spatio-Spectral Attention Fusion Network (SSAFN) combines coordinate-aware spatial modeling, refined self-attention channel combinator, and spectral noise suppression without conventional beamforming. Third, Rand-SHT training enhances robustness through random channel selection and array geometry reconstruction. The system achieves 39.26\% average CER across heterogeneous arrays (e.g., circular, square, and binaural) on datasets including Aishell-4, Alimeeting, and XMOS, with 97.1\% fewer computations than conventional neural beamformers.

* Interspeech 2025

Via

Access Paper or Ask Questions

Deep learning based spatial aliasing reduction in beamforming for audio capture

May 26, 2025

Mateusz Guzik, Giulio Cengarle, Daniel Arteaga

Abstract:Spatial aliasing affects spaced microphone arrays, causing directional ambiguity above certain frequencies, degrading spatial and spectral accuracy of beamformers. Given the limitations of conventional signal processing and the scarcity of deep learning approaches to spatial aliasing mitigation, we propose a novel approach using a U-Net architecture to predict a signal-dependent de-aliasing filter, which reduces aliasing in conventional beamforming for spatial capture. Two types of multichannel filters are considered, one which treats the channels independently and a second one that models cross-channel dependencies. The proposed approach is evaluated in two common spatial capture scenarios: stereo and first-order Ambisonics. The results indicate a very significant improvement, both objective and perceptual, with respect to conventional beamforming. This work shows the potential of deep learning to reduce aliasing in beamforming, leading to improvements in multi-microphone setups.

* 5 pages, 4 figures; accepted for presentation in Interspeech 2025

Via

Access Paper or Ask Questions

ArrayDPS: Unsupervised Blind Speech Separation with a Diffusion Prior

May 17, 2025

Zhongweiyang Xu, Xulin Fan, Zhong-Qiu Wang, Xilin Jiang, Romit Roy Choudhury

Abstract:Blind Speech Separation (BSS) aims to separate multiple speech sources from audio mixtures recorded by a microphone array. The problem is challenging because it is a blind inverse problem, i.e., the microphone array geometry, the room impulse response (RIR), and the speech sources, are all unknown. We propose ArrayDPS to solve the BSS problem in an unsupervised, array-agnostic, and generative manner. The core idea builds on diffusion posterior sampling (DPS), but unlike DPS where the likelihood is tractable, ArrayDPS must approximate the likelihood by formulating a separate optimization problem. The solution to the optimization approximates room acoustics and the relative transfer functions between microphones. These approximations, along with the diffusion priors, iterate through the ArrayDPS sampling process and ultimately yield separated voice sources. We only need a simple single-speaker speech diffusion model as a prior along with the mixtures recorded at the microphones; no microphone array information is necessary. Evaluation results show that ArrayDPS outperforms all baseline unsupervised methods while being comparable to supervised methods in terms of SDR. Audio demos are provided at: https://arraydps.github.io/ArrayDPSDemo/.

* Paper Accepted at ICML2025 Demo: https://arraydps.github.io/ArrayDPSDemo/ Code: https://github.com/ArrayDPS/ArrayDPS

Via

Access Paper or Ask Questions

Unsupervised Blind Speech Separation with a Diffusion Prior

May 08, 2025

Zhongweiyang Xu, Xulin Fan, Zhong-Qiu Wang, Xilin Jiang, Romit Roy Choudhury

* Paper Accepted at ICML2025 Demo: https://arraydps.github.io/ArrayDPSDemo/ Code: https://github.com/ArrayDPS/ArrayDPS

Via

Access Paper or Ask Questions

Topic:Microphone Array

Papers and Code