Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Yu-Chia Kuo

A Study on Synthesizing Expressive Violin Performances: Approaches and Comparisons

Jun 26, 2024

Tzu-Yun Hung, Jui-Te Wu, Yu-Chia Kuo, Yo-Wei Hsiao, Ting-Wei Lin, Li Su

Figure 1 for A Study on Synthesizing Expressive Violin Performances: Approaches and Comparisons

Figure 2 for A Study on Synthesizing Expressive Violin Performances: Approaches and Comparisons

Figure 3 for A Study on Synthesizing Expressive Violin Performances: Approaches and Comparisons

Figure 4 for A Study on Synthesizing Expressive Violin Performances: Approaches and Comparisons

Abstract:Expressive music synthesis (EMS) for violin performance is a challenging task due to the disagreement among music performers in the interpretation of expressive musical terms (EMTs), scarcity of labeled recordings, and limited generalization ability of the synthesis model. These challenges create trade-offs between model effectiveness, diversity of generated results, and controllability of the synthesis system, making it essential to conduct a comparative study on EMS model design. This paper explores two violin EMS approaches. The end-to-end approach is a modification of a state-of-the-art text-to-speech generator. The parameter-controlled approach is based on a simple parameter sampling process that can render note lengths and other parameters compatible with MIDI-DDSP. We study these two approaches (in total, three model variants) through objective and subjective experiments and discuss several key issues of EMS based on the results.

* 15 pages, 2 figures, 3 tables

Via

Access Paper or Ask Questions

MOSA: Music Motion with Semantic Annotation Dataset for Cross-Modal Music Processing

Jun 10, 2024

Yu-Fen Huang, Nikki Moran, Simon Coleman, Jon Kelly, Shun-Hwa Wei, Po-Yin Chen, Yun-Hsin Huang, Tsung-Ping Chen, Yu-Chia Kuo, Yu-Chi Wei(+5 more)

Figure 1 for MOSA: Music Motion with Semantic Annotation Dataset for Cross-Modal Music Processing

Figure 2 for MOSA: Music Motion with Semantic Annotation Dataset for Cross-Modal Music Processing

Figure 3 for MOSA: Music Motion with Semantic Annotation Dataset for Cross-Modal Music Processing

Figure 4 for MOSA: Music Motion with Semantic Annotation Dataset for Cross-Modal Music Processing

Abstract:In cross-modal music processing, translation between visual, auditory, and semantic content opens up new possibilities as well as challenges. The construction of such a transformative scheme depends upon a benchmark corpus with a comprehensive data infrastructure. In particular, the assembly of a large-scale cross-modal dataset presents major challenges. In this paper, we present the MOSA (Music mOtion with Semantic Annotation) dataset, which contains high quality 3-D motion capture data, aligned audio recordings, and note-by-note semantic annotations of pitch, beat, phrase, dynamic, articulation, and harmony for 742 professional music performances by 23 professional musicians, comprising more than 30 hours and 570 K notes of data. To our knowledge, this is the largest cross-modal music dataset with note-level annotations to date. To demonstrate the usage of the MOSA dataset, we present several innovative cross-modal music information retrieval (MIR) and musical content generation tasks, including the detection of beats, downbeats, phrase, and expressive contents from audio, video and motion data, and the generation of musicians' body motion from given music audio. The dataset and codes are available alongside this publication (https://github.com/yufenhuang/MOSA-Music-mOtion-and-Semantic-Annotation-dataset).

* IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2024. 14 pages, 7 figures. Dataset is available on: https://github.com/yufenhuang/MOSA-Music-mOtion-and-Semantic-Annotation-dataset/tree/main and https://zenodo.org/records/11393449

Via

Access Paper or Ask Questions