Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:WaterPark: A Robustness Assessment of Language Model Watermarking

Nov 20, 2024

Jiacheng Liang, Zian Wang, Lauren Hong, Shouling Ji, Ting Wang

Figure 1 for WaterPark: A Robustness Assessment of Language Model Watermarking

Figure 2 for WaterPark: A Robustness Assessment of Language Model Watermarking

Figure 3 for WaterPark: A Robustness Assessment of Language Model Watermarking

Figure 4 for WaterPark: A Robustness Assessment of Language Model Watermarking

Share this with someone who'll enjoy it:

Abstract:To mitigate the misuse of large language models (LLMs), such as disinformation, automated phishing, and academic cheating, there is a pressing need for the capability of identifying LLM-generated texts. Watermarking emerges as one promising solution: it plants statistical signals into LLMs' generative processes and subsequently verifies whether LLMs produce given texts. Various watermarking methods (``watermarkers'') have been proposed; yet, due to the lack of unified evaluation platforms, many critical questions remain under-explored: i) What are the strengths/limitations of various watermarkers, especially their attack robustness? ii) How do various design choices impact their robustness? iii) How to optimally operate watermarkers in adversarial environments? To fill this gap, we systematize existing LLM watermarkers and watermark removal attacks, mapping out their design spaces. We then develop WaterPark, a unified platform that integrates 10 state-of-the-art watermarkers and 12 representative attacks. More importantly, leveraging WaterPark, we conduct a comprehensive assessment of existing watermarkers, unveiling the impact of various design choices on their attack robustness. For instance, a watermarker's resilience to increasingly intensive attacks hinges on its context dependency. We further explore the best practices to operate watermarkers in adversarial environments. For instance, using a generic detector alongside a watermark-specific detector improves the security of vulnerable watermarkers. We believe our study sheds light on current LLM watermarking techniques while WaterPark serves as a valuable testbed to facilitate future research.

* 22 pages

View paper on

Share this with someone who'll enjoy it:

Title:WaterPark: A Robustness Assessment of Language Model Watermarking

Paper and Code