Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:SOLIDO: A Robust Watermarking Method for Speech Synthesis via Low-Rank Adaptation

Apr 21, 2025

Yue Li, Weizhi Liu, Dongdong Lin

Figure 1 for SOLIDO: A Robust Watermarking Method for Speech Synthesis via Low-Rank Adaptation

Figure 2 for SOLIDO: A Robust Watermarking Method for Speech Synthesis via Low-Rank Adaptation

Figure 3 for SOLIDO: A Robust Watermarking Method for Speech Synthesis via Low-Rank Adaptation

Figure 4 for SOLIDO: A Robust Watermarking Method for Speech Synthesis via Low-Rank Adaptation

Share this with someone who'll enjoy it:

Abstract:The accelerated advancement of speech generative models has given rise to security issues, including model infringement and unauthorized abuse of content. Although existing generative watermarking techniques have proposed corresponding solutions, most methods require substantial computational overhead and training costs. In addition, some methods have limitations in robustness when handling variable-length inputs. To tackle these challenges, we propose \textsc{SOLIDO}, a novel generative watermarking method that integrates parameter-efficient fine-tuning with speech watermarking through low-rank adaptation (LoRA) for speech diffusion models. Concretely, the watermark encoder converts the watermark to align with the input of diffusion models. To achieve precise watermark extraction from variable-length inputs, the watermark decoder based on depthwise separable convolution is designed for watermark recovery. To further enhance speech generation performance and watermark extraction capability, we propose a speech-driven lightweight fine-tuning strategy, which reduces computational overhead through LoRA. Comprehensive experiments demonstrate that the proposed method ensures high-fidelity watermarked speech even at a large capacity of 2000 bps. Furthermore, against common individual and compound speech attacks, our SOLIDO achieves a maximum average extraction accuracy of 99.20\% and 98.43\%, respectively. It surpasses other state-of-the-art methods by nearly 23\% in resisting time-stretching attacks.

View paper on

Share this with someone who'll enjoy it:

Title:SOLIDO: A Robust Watermarking Method for Speech Synthesis via Low-Rank Adaptation

Paper and Code