Abstract:We propose a data-driven sparse recovery framework for hybrid spherical linear microphone arrays using singular value decomposition (SVD) of the transfer operator. The SVD yields orthogonal microphone and field modes, reducing to spherical harmonics (SH) in the SMA-only case, while incorporating LMAs introduces complementary modes beyond SH. Modal analysis reveals consistent divergence from SH across frequency, confirming the improved spatial selectivity. Experiments in reverberant conditions show reduced energy-map mismatch and angular error across frequency, distance, and source count, outperforming SMA-only and direct concatenation. The results demonstrate that SVD-modal processing provides a principled and unified treatment of hybrid arrays for robust sparse sound-field reconstruction.
Abstract:Spherical microphone arrays (SMAs) are widely used for sound field analysis, and sparse recovery (SR) techniques can significantly enhance their spatial resolution by modeling the sound field as a sparse superposition of dominant plane waves. However, the spatial resolution of SMAs is fundamentally limited by their spherical harmonic order, and their performance often degrades in reverberant environments. This paper proposes a two-stage SR framework with residue refinement that integrates observations from a central SMA and four surrounding linear microphone arrays (LMAs). The core idea is to exploit complementary spatial characteristics by treating the SMA as a primary estimator and the LMAs as a spatially complementary refiner. Simulation results demonstrate that the proposed SMA-LMA method significantly enhances spatial energy map reconstruction under varying reverberation conditions, compared to both SMA-only and direct one-step joint processing. These results demonstrate the effectiveness of the proposed framework in enhancing spatial fidelity and robustness in complex acoustic environments.
Abstract:The concept of function and affordance is a critical aspect of 3D scene understanding and supports task-oriented objectives. In this work, we develop a model that learns to structure and vary functional affordance across a 3D hierarchical scene graph representing the spatial organization of a scene. The varying functional affordance is designed to integrate with the varying spatial context of the graph. More specifically, we develop an algorithm that learns to construct a 3D hierarchical scene graph (3DHSG) that captures the spatial organization of the scene. Starting from segmented object point clouds and object semantic labels, we develop a 3DHSG with a top node that identifies the room label, child nodes that define local spatial regions inside the room with region-specific affordances, and grand-child nodes indicating object locations and object-specific affordances. To support this work, we create a custom 3DHSG dataset that provides ground truth data for local spatial regions with region-specific affordances and also object-specific affordances for each object. We employ a transformer-based model to learn the 3DHSG. We use a multi-task learning framework that learns both room classification and learns to define spatial regions within the room with region-specific affordances. Our work improves on the performance of state-of-the-art baseline models and shows one approach for applying transformer models to 3D scene understanding and the generation of 3DHSGs that capture the spatial organization of a room. The code and dataset are publicly available.




Abstract:The propagation of sound in a shallow water environment is characterized by boundary reflections from the sea surface and sea floor. These reflections result in multiple (indirect) sound propagation paths, which can degrade the performance of passive sound source localization methods. This paper proposes the use of convolutional neural networks (CNNs) for the localization of sources of broadband acoustic radiated noise (such as motor vessels) in shallow water multipath environments. It is shown that CNNs operating on cepstrogram and generalized cross-correlogram inputs are able to more reliably estimate the instantaneous range and bearing of transiting motor vessels when the source localization performance of conventional passive ranging methods is degraded. The ensuing improvement in source localization performance is demonstrated using real data collected during an at-sea experiment.