Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Taegyun Kwon

Dialogue in Resonance: An Interactive Music Piece for Piano and Real-Time Automatic Transcription System

May 22, 2025

Hayeon Bang, Taegyun Kwon, Juhan Nam

Abstract:This paper presents <Dialogue in Resonance>, an interactive music piece for a human pianist and a computer-controlled piano that integrates real-time automatic music transcription into a score-driven framework. Unlike previous approaches that primarily focus on improvisation-based interactions, our work establishes a balanced framework that combines composed structure with dynamic interaction. Through real-time automatic transcription as its core mechanism, the computer interprets and responds to the human performer's input in real time, creating a musical dialogue that balances compositional intent with live interaction while incorporating elements of unpredictability. In this paper, we present the development process from composition to premiere performance, including technical implementation, rehearsal process, and performance considerations.

Via

Access Paper or Ask Questions

D3RM: A Discrete Denoising Diffusion Refinement Model for Piano Transcription

Jan 09, 2025

Hounsu Kim, Taegyun Kwon, Juhan Nam

Abstract:Diffusion models have been widely used in the generative domain due to their convincing performance in modeling complex data distributions. Moreover, they have shown competitive results on discriminative tasks, such as image segmentation. While diffusion models have also been explored for automatic music transcription, their performance has yet to reach a competitive level. In this paper, we focus on discrete diffusion model's refinement capabilities and present a novel architecture for piano transcription. Our model utilizes Neighborhood Attention layers as the denoising module, gradually predicting the target high-resolution piano roll, conditioned on the finetuned features of a pretrained acoustic model. To further enhance refinement, we devise a novel strategy which applies distinct transition states during training and inference stage of discrete diffusion models. Experiments on the MAESTRO dataset show that our approach outperforms previous diffusion-based piano transcription models and the baseline model in terms of F1 score. Our code is available in https://github.com/hanshounsu/d3rm.

* Accepted to ICASSP 2025

Via

Access Paper or Ask Questions

Towards Efficient and Real-Time Piano Transcription Using Neural Autoregressive Models

Apr 10, 2024

Taegyun Kwon, Dasaem Jeong, Juhan Nam

Abstract:In recent years, advancements in neural network designs and the availability of large-scale labeled datasets have led to significant improvements in the accuracy of piano transcription models. However, most previous work focused on high-performance offline transcription, neglecting deliberate consideration of model size. The goal of this work is to implement real-time inference for piano transcription while ensuring both high performance and lightweight. To this end, we propose novel architectures for convolutional recurrent neural networks, redesigning an existing autoregressive piano transcription model. First, we extend the acoustic module by adding a frequency-conditioned FiLM layer to the CNN module to adapt the convolutional filters on the frequency axis. Second, we improve note-state sequence modeling by using a pitchwise LSTM that focuses on note-state transitions within a note. In addition, we augment the autoregressive connection with an enhanced recursive context. Using these components, we propose two types of models; one for high performance and the other for high compactness. Through extensive experiments, we show that the proposed models are comparable to state-of-the-art models in terms of note accuracy on the MAESTRO dataset. We also investigate the effective model size and real-time inference latency by gradually streamlining the architecture. Finally, we conduct cross-data evaluation on unseen piano datasets and in-depth analysis to elucidate the effect of the proposed components in the view of note length and pitch range.

* 11 pages, 8 figures, preprint

Via

Access Paper or Ask Questions

A Real-Time Lyrics Alignment System Using Chroma And Phonetic Features For Classical Vocal Performance

Jan 17, 2024

Jiyun Park, Sangeon Yong, Taegyun Kwon, Juhan Nam

Abstract:The goal of real-time lyrics alignment is to take live singing audio as input and to pinpoint the exact position within given lyrics on the fly. The task can benefit real-world applications such as the automatic subtitling of live concerts or operas. However, designing a real-time model poses a great challenge due to the constraints of only using past input and operating within a minimal latency. Furthermore, due to the lack of datasets for real-time models for lyrics alignment, previous studies have mostly evaluated with private in-house datasets, resulting in a lack of standard evaluation methods. This paper presents a real-time lyrics alignment system for classical vocal performances with two contributions. First, we improve the lyrics alignment algorithm by finding an optimal combination of chromagram and phonetic posteriorgram (PPG) that capture melodic and phonetics features of the singing voice, respectively. Second, we recast the Schubert Winterreise Dataset (SWD) which contains multiple performance renditions of the same pieces as an evaluation set for the real-time lyrics alignment.

* To Appear IEEE ICASSP 2024

Via

Access Paper or Ask Questions

A study of audio mixing methods for piano transcription in violin-piano ensembles

May 23, 2023

Hyemi Kim, Jiyun Park, Taegyun Kwon, Dasaem Jeong, Juhan Nam

Abstract:While piano music transcription models have shown high performance for solo piano recordings, their performance degrades when applied to ensemble recordings. This study aims to analyze the impact of different data augmentation methods on piano transcription performance, specifically focusing on mixing techniques applied to violin-piano ensembles. We apply mixing methods that consider both harmonic and temporal characteristics of the audio. To create datasets for this study, we generated the PFVN-synth dataset, which contains 7 hours of violin-piano ensemble audio by rendering MIDI files and corresponding labels, and also collected unaccompanied violin recordings and mixed them with the MAESTRO dataset. We evaluated the transcription results on both synthesized and real audio recordings datasets.

* To Appear IEEE ICASSP 2023

Via

Access Paper or Ask Questions

YM2413-MDB: A Multi-Instrumental FM Video Game Music Dataset with Emotion Annotations

Nov 14, 2022

Eunjin Choi, Yoonjin Chung, Seolhee Lee, JongIk Jeon, Taegyun Kwon, Juhan Nam

Abstract:Existing multi-instrumental datasets tend to be biased toward pop and classical music. In addition, they generally lack high-level annotations such as emotion tags. In this paper, we propose YM2413-MDB, an 80s FM video game music dataset with multi-label emotion annotations. It includes 669 audio and MIDI files of music from Sega and MSX PC games in the 80s using YM2413, a programmable sound generator based on FM. The collected game music is arranged with a subset of 15 monophonic instruments and one drum instrument. They were converted from binary commands of the YM2413 sound chip. Each song was labeled with 19 emotion tags by two annotators and validated by three verifiers to obtain refined tags. We provide the baseline models and results for emotion recognition and emotion-conditioned symbolic music generation using YM2413-MDB.

* The paper has been accepted for publication at ISMIR 2022

Via

Access Paper or Ask Questions

TräumerAI: Dreaming Music with StyleGAN

Feb 09, 2021

Dasaem Jeong, Seungheon Doh, Taegyun Kwon

Figure 1 for TräumerAI: Dreaming Music with StyleGAN

Figure 2 for TräumerAI: Dreaming Music with StyleGAN

Figure 3 for TräumerAI: Dreaming Music with StyleGAN

Figure 4 for TräumerAI: Dreaming Music with StyleGAN

Abstract:The goal of this paper to generate a visually appealing video that responds to music with a neural network so that each frame of the video reflects the musical characteristics of the corresponding audio clip. To achieve the goal, we propose a neural music visualizer directly mapping deep music embeddings to style embeddings of StyleGAN, named Tr\"aumerAI, which consists of a music auto-tagging model using short-chunk CNN and StyleGAN2 pre-trained on WikiArt dataset. Rather than establishing an objective metric between musical and visual semantics, we manually labeled the pairs in a subjective manner. An annotator listened to 100 music clips of 10 seconds long and selected an image that suits the music among the 200 StyleGAN-generated examples. Based on the collected data, we trained a simple transfer function that converts an audio embedding to a style embedding. The generated examples show that the mapping between audio and video makes a certain level of intra-segment similarity and inter-segment dissimilarity.

* presented in NeurIPS Workshop 2020: Machine Learning for Creativity and Design

Via

Access Paper or Ask Questions

Polyphonic Piano Transcription Using Autoregressive Multi-State Note Model

Oct 02, 2020

Taegyun Kwon, Dasaem Jeong, Juhan Nam

Figure 1 for Polyphonic Piano Transcription Using Autoregressive Multi-State Note Model

Figure 2 for Polyphonic Piano Transcription Using Autoregressive Multi-State Note Model

Figure 3 for Polyphonic Piano Transcription Using Autoregressive Multi-State Note Model

Figure 4 for Polyphonic Piano Transcription Using Autoregressive Multi-State Note Model

Abstract:Recent advances in polyphonic piano transcription have been made primarily by a deliberate design of neural network architectures that detect different note states such as onset or sustain and model the temporal evolution of the states. The majority of them, however, use separate neural networks for each note state, thereby optimizing multiple loss functions, and also they handle the temporal evolution of note states by abstract connections between the state-wise neural networks or using a post-processing module. In this paper, we propose a unified neural network architecture where multiple note states are predicted as a softmax output with a single loss function and the temporal order is learned by an auto-regressive connection within the single neural network. This compact model allows to increase note states without architectural complexity. Using the MAESTRO dataset, we examine various combinations of multiple note states including on, onset, sustain, re-onset, offset, and off. We also show that the autoregressive module effectively learns inter-state dependency of notes. Finally, we show that our proposed model achieves performance comparable to state-of-the-arts with fewer parameters.

* 6+2 pages, 5 figures, Camera-ready version. To be published in ISMIR 2020. Project page is available at https://TaegyunKwon.github.io/ar_multi_transcription

Via

Access Paper or Ask Questions