Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Tongyu Lu

MelodySim: Measuring Melody-aware Music Similarity for Plagiarism Detection

May 27, 2025

Tongyu Lu, Charlotta-Marlena Geist, Jan Melechovsky, Abhinaba Roy, Dorien Herremans

Abstract:We propose MelodySim, a melody-aware music similarity model and dataset for plagiarism detection. First, we introduce a novel method to construct a dataset with focus on melodic similarity. By augmenting Slakh2100; an existing MIDI dataset, we generate variations of each piece while preserving the melody through modifications such as note splitting, arpeggiation, minor track dropout (excluding bass), and re-instrumentation. A user study confirms that positive pairs indeed contain similar melodies, with other musical tracks significantly changed. Second, we develop a segment-wise melodic-similarity detection model that uses a MERT encoder and applies a triplet neural network to capture melodic similarity. The resultant decision matrix highlights where plagiarism might occur. Our model achieves high accuracy on the MelodySim test set.

Via

Access Paper or Ask Questions

JamendoMaxCaps: A Large Scale Music-caption Dataset with Imputed Metadata

Feb 11, 2025

Abhinaba Roy, Renhang Liu, Tongyu Lu, Dorien Herremans

Abstract:We introduce JamendoMaxCaps, a large-scale music-caption dataset featuring over 200,000 freely licensed instrumental tracks from the renowned Jamendo platform. The dataset includes captions generated by a state-of-the-art captioning model, enhanced with imputed metadata. We also introduce a retrieval system that leverages both musical features and metadata to identify similar songs, which are then used to fill in missing metadata using a local large language model (LLLM). This approach allows us to provide a more comprehensive and informative dataset for researchers working on music-language understanding tasks. We validate this approach quantitatively with five different measurements. By making the JamendoMaxCaps dataset publicly available, we provide a high-quality resource to advance research in music-language understanding tasks such as music retrieval, multimodal representation learning, and generative music models.

* 8 pages, 5 figures

Via

Access Paper or Ask Questions

ImprovNet: Generating Controllable Musical Improvisations with Iterative Corruption Refinement

Feb 06, 2025

Keshav Bhandari, Sungkyun Chang, Tongyu Lu, Fareza R. Enus, Louis B. Bradshaw, Dorien Herremans, Simon Colton

Abstract:Deep learning has enabled remarkable advances in style transfer across various domains, offering new possibilities for creative content generation. However, in the realm of symbolic music, generating controllable and expressive performance-level style transfers for complete musical works remains challenging due to limited datasets, especially for genres such as jazz, and the lack of unified models that can handle multiple music generation tasks. This paper presents ImprovNet, a transformer-based architecture that generates expressive and controllable musical improvisations through a self-supervised corruption-refinement training strategy. ImprovNet unifies multiple capabilities within a single model: it can perform cross-genre and intra-genre improvisations, harmonize melodies with genre-specific styles, and execute short prompt continuation and infilling tasks. The model's iterative generation framework allows users to control the degree of style transfer and structural similarity to the original composition. Objective and subjective evaluations demonstrate ImprovNet's effectiveness in generating musically coherent improvisations while maintaining structural relationships with the original pieces. The model outperforms Anticipatory Music Transformer in short continuation and infilling tasks and successfully achieves recognizable genre conversion, with 79\% of participants correctly identifying jazz-style improvisations. Our code and demo page can be found at https://github.com/keshavbhandari/improvnet.

* 10 pages, 6 figures

Via

Access Paper or Ask Questions

Vanishing-Point-Guided Video Semantic Segmentation of Driving Scenes

Jan 27, 2024

Diandian Guo, Deng-Ping Fan, Tongyu Lu, Christos Sakaridis, Luc Van Gool

Figure 1 for Vanishing-Point-Guided Video Semantic Segmentation of Driving Scenes

Figure 2 for Vanishing-Point-Guided Video Semantic Segmentation of Driving Scenes

Figure 3 for Vanishing-Point-Guided Video Semantic Segmentation of Driving Scenes

Figure 4 for Vanishing-Point-Guided Video Semantic Segmentation of Driving Scenes

Abstract:The estimation of implicit cross-frame correspondences and the high computational cost have long been major challenges in video semantic segmentation (VSS) for driving scenes. Prior works utilize keyframes, feature propagation, or cross-frame attention to address these issues. By contrast, we are the first to harness vanishing point (VP) priors for more effective segmentation. Intuitively, objects near VPs (i.e., away from the vehicle) are less discernible. Moreover, they tend to move radially away from the VP over time in the usual case of a forward-facing camera, a straight road, and linear forward motion of the vehicle. Our novel, efficient network for VSS, named VPSeg, incorporates two modules that utilize exactly this pair of static and dynamic VP priors: sparse-to-dense feature mining (DenseVP) and VP-guided motion fusion (MotionVP). MotionVP employs VP-guided motion estimation to establish explicit correspondences across frames and help attend to the most relevant features from neighboring frames, while DenseVP enhances weak dynamic features in distant regions around VPs. These modules operate within a context-detail framework, which separates contextual features from high-resolution local features at different input resolutions to reduce computational costs. Contextual and local features are integrated through contextualized motion attention (CMA) for the final prediction. Extensive experiments on two popular driving segmentation benchmarks, Cityscapes and ACDC, demonstrate that VPSeg outperforms previous SOTA methods, with only modest computational overhead.

Via

Access Paper or Ask Questions

Word Representation for Rhythms

Sep 03, 2020

Tongyu Lu, Lyucheng Yan, Gus Xia

Figure 1 for Word Representation for Rhythms

Figure 2 for Word Representation for Rhythms

Figure 3 for Word Representation for Rhythms

Figure 4 for Word Representation for Rhythms

Abstract:This paper proposes a word representation strategy for rhythm patterns. Using 1034 pieces of Nottingham Dataset, a rhythm word dictionary whose size is 450 (without control tokens) is generated. BERT model is created to explore syntactic potentials of rhythm words. Our model is able to find overall music structures and cluster different meters. In a larger scheme, a think mode - music as language - is proposed for systematic considerations.

* Construction of this paper is ill. The first author decides that this paper needs thorough rewriting and more experiments should be done

Via

Access Paper or Ask Questions