Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:MuLan: A Joint Embedding of Music Audio and Natural Language

Aug 26, 2022

Qingqing Huang, Aren Jansen, Joonseok Lee, Ravi Ganti, Judith Yue Li, Daniel P. W. Ellis

Figure 1 for MuLan: A Joint Embedding of Music Audio and Natural Language

Figure 2 for MuLan: A Joint Embedding of Music Audio and Natural Language

Figure 3 for MuLan: A Joint Embedding of Music Audio and Natural Language

Figure 4 for MuLan: A Joint Embedding of Music Audio and Natural Language

Share this with someone who'll enjoy it:

Abstract:Music tagging and content-based retrieval systems have traditionally been constructed using pre-defined ontologies covering a rigid set of music attributes or text queries. This paper presents MuLan: a first attempt at a new generation of acoustic models that link music audio directly to unconstrained natural language music descriptions. MuLan takes the form of a two-tower, joint audio-text embedding model trained using 44 million music recordings (370K hours) and weakly-associated, free-form text annotations. Through its compatibility with a wide range of music genres and text styles (including conventional music tags), the resulting audio-text representation subsumes existing ontologies while graduating to true zero-shot functionalities. We demonstrate the versatility of the MuLan embeddings with a range of experiments including transfer learning, zero-shot music tagging, language understanding in the music domain, and cross-modal retrieval applications.

* To appear in ISMIR 2022

View paper on

Share this with someone who'll enjoy it:

Title:MuLan: A Joint Embedding of Music Audio and Natural Language

Paper and Code