Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:CodeIP: A Grammar-Guided Multi-Bit Watermark for Large Language Models of Code

Apr 24, 2024

Batu Guan, Yao Wan, Zhangqian Bi, Zheng Wang, Hongyu Zhang, Yulei Sui, Pan Zhou, Lichao Sun

Figure 1 for CodeIP: A Grammar-Guided Multi-Bit Watermark for Large Language Models of Code

Figure 2 for CodeIP: A Grammar-Guided Multi-Bit Watermark for Large Language Models of Code

Figure 3 for CodeIP: A Grammar-Guided Multi-Bit Watermark for Large Language Models of Code

Figure 4 for CodeIP: A Grammar-Guided Multi-Bit Watermark for Large Language Models of Code

Share this with someone who'll enjoy it:

Abstract:As Large Language Models (LLMs) are increasingly used to automate code generation, it is often desired to know if the code is AI-generated and by which model, especially for purposes like protecting intellectual property (IP) in industry and preventing academic misconduct in education. Incorporating watermarks into machine-generated content is one way to provide code provenance, but existing solutions are restricted to a single bit or lack flexibility. We present CodeIP, a new watermarking technique for LLM-based code generation. CodeIP enables the insertion of multi-bit information while preserving the semantics of the generated code, improving the strength and diversity of the inerseted watermark. This is achieved by training a type predictor to predict the subsequent grammar type of the next token to enhance the syntactical and semantic correctness of the generated code. Experiments on a real-world dataset across five programming languages showcase the effectiveness of CodeIP.

* 13 pages, 7 figures

View paper on

Share this with someone who'll enjoy it:

Title:CodeIP: A Grammar-Guided Multi-Bit Watermark for Large Language Models of Code

Paper and Code