Abstract:Different methods can be employed to render virtual reverberation, often requiring substantial information about the room's geometry and the acoustic characteristics of the surfaces. However, fully comprehensive approaches that account for all aspects of a given environment may be computationally costly and redundant from a perceptual standpoint. For these methods, achieving a trade-off between perceptual authenticity and model's complexity becomes a relevant challenge. This study investigates this compromise through the use of geometrical acoustics to render Ambisonics-based binaural reverberation. Its precision is determined, among other factors, by its fidelity to the room's geometry and to the acoustic properties of its materials. The purpose of this study is to investigate the impact of simplifying the room geometry and the frequency resolution of absorption coefficients on the perception of reverberation within a virtual sound scene. Several decimated models based on a single room were perceptually evaluated using the a multi-stimulus comparison method. Additionally, these differences were numerically assessed through the calculation of acoustic parameters of the reverberation. According to numerical and perceptual evaluations, lowering the frequency resolution of absorption coefficients can have a significant impact on the perception of reverberation, while a less notable impact was observed when decimating the geometry of the model.
Abstract:An individualised head-related transfer function (HRTF) is essential for creating realistic virtual reality (VR) and augmented reality (AR) environments. However, acoustically measuring high-quality HRTFs requires expensive equipment and an acoustic lab setting. To overcome these limitations and to make this measurement more efficient HRTF upsampling has been exploited in the past where a high-resolution HRTF is created from a low-resolution one. This paper demonstrates how generative adversarial networks (GANs) can be applied to HRTF upsampling. We propose a novel approach that transforms the HRTF data for convenient use with a convolutional super-resolution generative adversarial network (SRGAN). This new approach is benchmarked against two baselines: barycentric upsampling and a HRTF selection approach. Experimental results show that the proposed method outperforms both baselines in terms of log-spectral distortion (LSD) and localisation performance using perceptual models when the input HRTF is sparse.
Abstract:As spatial audio is enjoying a surge in popularity, data-driven machine learning techniques that have been proven successful in other domains are increasingly used to process head-related transfer function measurements. However, these techniques require much data, whereas the existing datasets are ranging from tens to the low hundreds of datapoints. It therefore becomes attractive to combine multiple of these datasets, although they are measured under different conditions. In this paper, we first establish the common ground between a number of datasets, then we investigate potential pitfalls of mixing datasets. We perform a simple experiment to test the relevance of the remaining differences between datasets when applying machine learning techniques. Finally, we pinpoint the most relevant differences.