Two-Stage Unet with Gated-Conv Fusion for Binaural Audio Synthesis.

Bibliographic Details
Title: Two-Stage Unet with Gated-Conv Fusion for Binaural Audio Synthesis.
Authors: Zhang, Wenjie1 (AUTHOR), He, Changjun1 (AUTHOR), Cao, Yinghan1 (AUTHOR), Xu, Shiyun1 (AUTHOR), Wang, Mingjiang1 (AUTHOR) mjwang@hit.edu.cn
Source: Sensors (14248220). Mar2025, Vol. 25 Issue 6, p1790. 18p.
Subject Terms: *BINAURAL audio, *SPACE perception, *MOTION capture (Human mechanics), *INFORMATION resources
Abstract: Binaural audio is crucial for creating immersive auditory experiences. However, due to the high cost and technical complexity of capturing binaural audio in real-world environments, there has been increasing interest in synthesizing binaural audio from monaural sources. In this paper, we propose a two-stage framework for binaural audio synthesis. Specifically, monaural audio is initially transformed into a preliminary binaural signal, and the shared common portion across the left and right channels, as well as the distinct differential portion in each channel, are extracted. Subsequently, the POS-ORI self-attention module (POSA) is introduced to integrate spatial information of the sound sources and capture their motion. Based on this representation, the common and differential components are separately reconstructed. The gated-convolutional fusion module (GCFM) is then employed to combine the reconstructed components and generate the final binaural audio. Experimental results demonstrate that the proposed method can accurately synthesize binaural audio and achieves state-of-the-art performance in phase estimation (Phase- l 2 : 0.789, Wave- l 2 : 0.147, Amplitude- l 2 : 0.036). [ABSTRACT FROM AUTHOR]
Copyright of Sensors (14248220) is the property of MDPI and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)
Database: Academic Search Complete
Full text is not displayed to guests.
More Details
ISSN:14248220
DOI:10.3390/s25061790
Published in:Sensors (14248220)
Language:English