Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Xuzhou Ye

ImmersiveFlow: Stereo-to-7.1.4 spatial audio generation with flow matching

Jan 19, 2026

Zining Liang, Runbang Wang, Xuzhou Ye, Qiuqiang Kong

Abstract:Immersive spatial audio has become increasingly critical for applications ranging from AR/VR to home entertainment and automotive sound systems. However, existing generative methods remain constrained to low-dimensional formats such as binaural audio and First-Order Ambisonics (FOA). Binaural rendering is inherently limited to headphone playback, while FOA suffers from spatial aliasing and insufficient resolution for high-frequency. To overcome these limitations, we introduce ImmersiveFlow, the first end-to-end generative framework that directly synthesizes discrete 7.1.4 format spatial audio from stereo input. ImmersiveFlow leverages Flow Matching to learn trajectories from stereo inputs to multichannel spatial features within a pretrained VAE latent space. At inference, the Flow Matching model predicted latent features are decoded by the VAE and converted into the final 7.1.4 waveform. Comprehensive objective and subjective evaluations demonstrate that our method produces perceptually rich sound fields and enhanced externalization, significantly outperforming traditional upmixing techniques. Code implementations and audio samples are provided at: https://github.com/violet-audio/ImmersiveFlow.

* 5 pages, 3 figures, 2 tables

Via

Access Paper or Ask Questions

Binaural Rendering of Ambisonic Signals by Neural Networks

Nov 04, 2022

Yin Zhu, Qiuqiang Kong, Junjie Shi, Shilei Liu, Xuzhou Ye, Ju-chiang Wang, Junping Zhang

Figure 1 for Binaural Rendering of Ambisonic Signals by Neural Networks

Figure 2 for Binaural Rendering of Ambisonic Signals by Neural Networks

Figure 3 for Binaural Rendering of Ambisonic Signals by Neural Networks

Figure 4 for Binaural Rendering of Ambisonic Signals by Neural Networks

Abstract:Binaural rendering of ambisonic signals is of broad interest to virtual reality and immersive media. Conventional methods often require manually measured Head-Related Transfer Functions (HRTFs). To address this issue, we collect a paired ambisonic-binaural dataset and propose a deep learning framework in an end-to-end manner. Experimental results show that neural networks outperform the conventional method in objective metrics and achieve comparable subjective metrics. To validate the proposed framework, we experimentally explore different settings of the input features, model structures, output features, and loss functions. Our proposed system achieves an SDR of 7.32 and MOSs of 3.83, 3.58, 3.87, 3.58 in quality, timbre, localization, and immersion dimensions.

Via

Access Paper or Ask Questions