Abstract:Radio frequency (RF) signals have been proved to be flexible for human silhouette segmentation (HSS) under complex environments. Existing studies are mainly based on a one-shot approach, which lacks a coherent projection ability from the RF domain. Additionally, the spatio-temporal patterns have not been fully explored for human motion dynamics in HSS. Therefore, we propose a two-stage Sequential Diffusion Model (SDM) to progressively synthesize high-quality segmentation jointly with the considerations on motion dynamics. Cross-view transformation blocks are devised to guide the diffusion model in a multi-scale manner for comprehensively characterizing human related patterns in an individual frame such as directional projection from signal planes. Moreover, spatio-temporal blocks are devised to fine-tune the frame-level model to incorporate spatio-temporal contexts and motion dynamics, enhancing the consistency of the segmentation maps. Comprehensive experiments on a public benchmark -- HIBER demonstrate the state-of-the-art performance of our method with an IoU 0.732. Our code is available at https://github.com/ph-w2000/SDM.
Abstract:Robust audio anti-spoofing has been increasingly challenging due to the recent advancements on deepfake techniques. While spectrograms have demonstrated their capability for anti-spoofing, complementary information presented in multi-order spectral patterns have not been well explored, which limits their effectiveness for varying spoofing attacks. Therefore, we propose a novel deep learning method with a spectral fusion-reconstruction strategy, namely S2pecNet, to utilise multi-order spectral patterns for robust audio anti-spoofing representations. Specifically, spectral patterns up to second-order are fused in a coarse-to-fine manner and two branches are designed for the fine-level fusion from the spectral and temporal contexts. A reconstruction from the fused representation to the input spectrograms further reduces the potential fused information loss. Our method achieved the state-of-the-art performance with an EER of 0.77% on a widely used dataset: ASVspoof2019 LA Challenge.