Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Natsuki Ueno

Sound Field Estimation: Theories and Applications

Mar 13, 2025

Natsuki Ueno, Shoichi Koyama

Abstract:The spatial information of sound plays a crucial role in various situations, ranging from daily activities to advanced engineering technologies. To fully utilize its potential, numerous research studies on spatial audio signal processing have been carried out in the literature. Sound field estimation is one of the key foundational technologies that can be applied to a wide range of acoustic signal processing techniques, including sound field reproduction using loudspeakers and binaural playback through headphones. The purpose of this paper is to present an overview of sound field estimation methods. After providing the necessary mathematical background, two different approaches to sound field estimation will be explained. This paper focuses on clarifying the essential theories of each approach, while also referencing state-of-the-art developments. Finally, several acoustic signal processing technologies will be discussed as examples of the application of sound field estimation.

* Foundations and Trends in Signal Processing, vol. 19, no. 1, pp. 1-98, 2025
* published in Foundations and Trends in Signal Processing, vol. 19, no. 1; see https://www.nowpublishers.com/article/Details/SIG-121

Via

Access Paper or Ask Questions

Mel-Spectrogram Inversion via Alternating Direction Method of Multipliers

Jan 09, 2025

Yoshiki Masuyama, Natsuki Ueno, Nobutaka Ono

Abstract:Signal reconstruction from its mel-spectrogram is known as mel-spectrogram inversion and has many applications, including speech and foley sound synthesis. In this paper, we propose a mel-spectrogram inversion method based on a rigorous optimization algorithm. To reconstruct a time-domain signal with inverse short-time Fourier transform (STFT), both full-band STFT magnitude and phase should be predicted from a given mel-spectrogram. Their joint estimation has outperformed the cascaded full-band magnitude prediction and phase reconstruction by preventing error accumulation. However, the existing joint estimation method requires many iterations, and there remains room for performance improvement. We present an alternating direction method of multipliers (ADMM)-based joint estimation method motivated by its success in various nonconvex optimization problems including phase reconstruction. An efficient update of each variable is derived by exploiting the conditional independence among the variables. Our experiments demonstrate the effectiveness of the proposed method on speech and foley sounds.

* Accepted to ICASSP 2025

Via

Access Paper or Ask Questions

Physics-Informed Machine Learning For Sound Field Estimation

Aug 27, 2024

Shoichi Koyama, Juliano G. C. Ribeiro, Tomohiko Nakamura, Natsuki Ueno, Mirco Pezzoli

Figure 1 for Physics-Informed Machine Learning For Sound Field Estimation

Figure 2 for Physics-Informed Machine Learning For Sound Field Estimation

Figure 3 for Physics-Informed Machine Learning For Sound Field Estimation

Figure 4 for Physics-Informed Machine Learning For Sound Field Estimation

Abstract:The area of study concerning the estimation of spatial sound, i.e., the distribution of a physical quantity of sound such as acoustic pressure, is called sound field estimation, which is the basis for various applied technologies related to spatial audio processing. The sound field estimation problem is formulated as a function interpolation problem in machine learning in a simplified scenario. However, high estimation performance cannot be expected by simply applying general interpolation techniques that rely only on data. The physical properties of sound fields are useful a priori information, and it is considered extremely important to incorporate them into the estimation. In this article, we introduce the fundamentals of physics-informed machine learning (PIML) for sound field estimation and overview current PIML-based sound field estimation methods.

* Accepted to IEEE Signal Processing Magazine, Special Issue on Model-based and Data-Driven Audio Signal Processing

Via

Access Paper or Ask Questions

Signal Reconstruction from Mel-spectrogram Based on Bi-level Consistency of Full-band Magnitude and Phase

Jul 23, 2023

Yoshiki Masuyama, Natsuki Ueno, Nobutaka Ono

Abstract:We propose an optimization-based method for reconstructing a time-domain signal from a low-dimensional spectral representation such as a mel-spectrogram. Phase reconstruction has been studied to reconstruct a time-domain signal from the full-band short-time Fourier transform (STFT) magnitude. The Griffin-Lim algorithm (GLA) has been widely used because it relies only on the redundancy of STFT and is applicable to various audio signals. In this paper, we jointly reconstruct the full-band magnitude and phase by considering the bi-level relationships among the time-domain signal, its STFT coefficients, and its mel-spectrogram. The proposed method is formulated as a rigorous optimization problem and estimates the full-band magnitude based on the criterion used in GLA. Our experiments demonstrate the effectiveness of the proposed method on speech, music, and environmental signals.

* Accepted to IEEE WASPAA 2023

Via

Access Paper or Ask Questions

Weighted Pressure and Mode Matching for Sound Field Reproduction: Theoretical and Experimental Comparisons

Mar 23, 2023

Shoichi Koyama, Keisuke Kimura, Natsuki Ueno

Abstract:Two sound field reproduction methods, weighted pressure matching and weighted mode matching, are theoretically and experimentally compared. The weighted pressure and mode matching are a generalization of conventional pressure and mode matching, respectively. Both methods are derived by introducing a weighting matrix in the pressure and mode matching. The weighting matrix in the weighted pressure matching is defined on the basis of the kernel interpolation of the sound field from pressure at a discrete set of control points. In the weighted mode matching, the weighting matrix is defined by a regional integration of spherical wavefunctions. It is theoretically shown that the weighted pressure matching is a special case of the weighted mode matching by infinite-dimensional harmonic analysis for estimating expansion coefficients from pressure observations. The difference between the two methods are discussed through experiments.

* Accepted to Journal of Audio Engineering Society, Special Issue on Spatial Audio

Via

Access Paper or Ask Questions

Mean-square-error-based secondary source placement in sound field synthesis with prior information on desired field

Dec 10, 2021

Keisuke Kimura, Shoichi Koyama, Natsuki Ueno, Hiroshi Saruwatari

Figure 1 for Mean-square-error-based secondary source placement in sound field synthesis with prior information on desired field

Figure 2 for Mean-square-error-based secondary source placement in sound field synthesis with prior information on desired field

Figure 3 for Mean-square-error-based secondary source placement in sound field synthesis with prior information on desired field

Figure 4 for Mean-square-error-based secondary source placement in sound field synthesis with prior information on desired field

Abstract:A method of optimizing secondary source placement in sound field synthesis is proposed. Such an optimization method will be useful when the allowable placement region and available number of loudspeakers are limited. We formulate a mean-square-error-based cost function, incorporating the statistical properties of possible desired sound fields, for general linear-least-squares-based sound field synthesis methods, including pressure matching and (weighted) mode matching, whereas most of the current methods are applicable only to the pressure-matching method. An efficient greedy algorithm for minimizing the proposed cost function is also derived. Numerical experiments indicated that a high reproduction accuracy can be achieved by the placement optimized by the proposed method compared with the empirically used regular placement.

* Accepted to IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA) 2021

Via

Access Paper or Ask Questions

Sound Field Reproduction With Weighted Mode Matching and Infinite-Dimensional Harmonic Analysis: An Experimental Evaluation

Nov 22, 2021

Shoichi Koyama, Keisuke Kimura, Natsuki Ueno

Figure 1 for Sound Field Reproduction With Weighted Mode Matching and Infinite-Dimensional Harmonic Analysis: An Experimental Evaluation

Figure 2 for Sound Field Reproduction With Weighted Mode Matching and Infinite-Dimensional Harmonic Analysis: An Experimental Evaluation

Figure 3 for Sound Field Reproduction With Weighted Mode Matching and Infinite-Dimensional Harmonic Analysis: An Experimental Evaluation

Figure 4 for Sound Field Reproduction With Weighted Mode Matching and Infinite-Dimensional Harmonic Analysis: An Experimental Evaluation

Abstract:Sound field reproduction methods based on numerical optimization, which aim to minimize the error between synthesized and desired sound fields, are useful in many practical scenarios because of their flexibility in the array geometry of loudspeakers. However, the reproduction performance of these methods in a practical environment has not been sufficiently investigated. We evaluate weighted mode matching, which is a sound field reproduction method based on the spherical wavefunction expansion of the sound field, in comparison with conventional pressure matching. We also introduce a method of infinite-dimensional harmonic analysis for estimating the expansion coefficients of the sound field from microphone measurements. Experimental results indicated that weighted mode matching using the expansion coefficients of the transfer functions estimated by the infinite-dimensional harmonic analysis outperforms conventional pressure matching, especially when the number of microphones is small.

* Accepted to International Conference on Immersive and 3D Audio (I3DA) 2021

Via

Access Paper or Ask Questions

Kernel Learning For Sound Field Estimation With L1 and L2 Regularizations

Oct 12, 2021

Ryosuke Horiuchi, Shoichi Koyama, Juliano G. C. Ribeiro, Natsuki Ueno, Hiroshi Saruwatari

Figure 1 for Kernel Learning For Sound Field Estimation With L1 and L2 Regularizations

Figure 2 for Kernel Learning For Sound Field Estimation With L1 and L2 Regularizations

Figure 3 for Kernel Learning For Sound Field Estimation With L1 and L2 Regularizations

Abstract:A method to estimate an acoustic field from discrete microphone measurements is proposed. A kernel-interpolation-based method using the kernel function formulated for sound field interpolation has been used in various applications. The kernel function with directional weighting makes it possible to incorporate prior information on source directions to improve estimation accuracy. However, in prior studies, parameters for directional weighting have been empirically determined. We propose a method to optimize these parameters using observation values, which is particularly useful when prior information on source directions is uncertain. The proposed algorithm is based on discretization of the parameters and representation of the kernel function as a weighted sum of sub-kernels. Two types of regularization for the weights, $L_1$ and $L_2$, are investigated. Experimental results indicate that the proposed method achieves higher estimation accuracy than the method without kernel learning.

* Accepted to IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA) 2021

Via

Access Paper or Ask Questions

MeshRIR: A Dataset of Room Impulse Responses on Meshed Grid Points For Evaluating Sound Field Analysis and Synthesis Methods

Jun 21, 2021

Shoichi Koyama, Tomoya Nishida, Keisuke Kimura, Takumi Abe, Natsuki Ueno, Jesper Brunnström

Figure 1 for MeshRIR: A Dataset of Room Impulse Responses on Meshed Grid Points For Evaluating Sound Field Analysis and Synthesis Methods

Figure 2 for MeshRIR: A Dataset of Room Impulse Responses on Meshed Grid Points For Evaluating Sound Field Analysis and Synthesis Methods

Figure 3 for MeshRIR: A Dataset of Room Impulse Responses on Meshed Grid Points For Evaluating Sound Field Analysis and Synthesis Methods

Figure 4 for MeshRIR: A Dataset of Room Impulse Responses on Meshed Grid Points For Evaluating Sound Field Analysis and Synthesis Methods

Abstract:A new impulse response (IR) dataset called "MeshRIR" is introduced. Currently available datasets usually include IRs at an array of microphones from several source positions under various room conditions, which are basically designed for evaluating speech enhancement and distant speech recognition methods. On the other hand, methods of estimating or controlling spatial sound fields have been extensively investigated in recent years; however, the current IR datasets are not applicable to validating and comparing these methods because of the low spatial resolution of measurement points. MeshRIR consists of IRs measured at positions obtained by finely discretizing a spatial region. Two subdatasets are currently available: one consists of IRs in a three-dimensional cuboidal region from a single source, and the other consists of IRs in a two-dimensional square region from an array of 32 sources. Therefore, MeshRIR is suitable for evaluating sound field analysis and synthesis methods. This dataset is freely available at \url{https://sh01k.github.io/MeshRIR/} with some codes of sample applications.

Via

Access Paper or Ask Questions