Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Detecting Synthetic Lyrics with Few-Shot Inference

Jun 21, 2024

Yanis Labrak, Gabriel Meseguer-Brocal, Elena V. Epure

Figure 1 for Detecting Synthetic Lyrics with Few-Shot Inference

Figure 2 for Detecting Synthetic Lyrics with Few-Shot Inference

Figure 3 for Detecting Synthetic Lyrics with Few-Shot Inference

Figure 4 for Detecting Synthetic Lyrics with Few-Shot Inference

Share this with someone who'll enjoy it:

Abstract:In recent years, generated content in music has gained significant popularity, with large language models being effectively utilized to produce human-like lyrics in various styles, themes, and linguistic structures. This technological advancement supports artists in their creative processes but also raises issues of authorship infringement, consumer satisfaction and content spamming. To address these challenges, methods for detecting generated lyrics are necessary. However, existing works have not yet focused on this specific modality or on creative text in general regarding machine-generated content detection methods and datasets. In response, we have curated the first dataset of high-quality synthetic lyrics and conducted a comprehensive quantitative evaluation of various few-shot content detection approaches, testing their generalization capabilities and complementing this with a human evaluation. Our best few-shot detector, based on LLM2Vec, surpasses stylistic and statistical methods, which are shown competitive in other domains at distinguishing human-written from machine-generated content. It also shows good generalization capabilities to new artists and models, and effectively detects post-generation paraphrasing. This study emphasizes the need for further research on creative content detection, particularly in terms of generalization and scalability with larger song catalogs. All datasets, pre-processing scripts, and code are available publicly on GitHub and Hugging Face under the Apache 2.0 license.

* Under review

View paper on

Share this with someone who'll enjoy it:

Title:Detecting Synthetic Lyrics with Few-Shot Inference

Paper and Code