Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Dimos Makris

Singapore University of Technology and Design

Conditional Drums Generation using Compound Word Representations

Feb 21, 2022

Dimos Makris, Guo Zixun, Maximos Kaliakatsos-Papakostas, Dorien Herremans

Figure 1 for Conditional Drums Generation using Compound Word Representations

Figure 2 for Conditional Drums Generation using Compound Word Representations

Figure 3 for Conditional Drums Generation using Compound Word Representations

Figure 4 for Conditional Drums Generation using Compound Word Representations

Abstract:The field of automatic music composition has seen great progress in recent years, specifically with the invention of transformer-based architectures. When using any deep learning model which considers music as a sequence of events with multiple complex dependencies, the selection of a proper data representation is crucial. In this paper, we tackle the task of conditional drums generation using a novel data encoding scheme inspired by the Compound Word representation, a tokenization process of sequential data. Therefore, we present a sequence-to-sequence architecture where a Bidirectional Long short-term memory (BiLSTM) Encoder receives information about the conditioning parameters (i.e., accompanying tracks and musical attributes), while a Transformer-based Decoder with relative global attention produces the generated drum sequences. We conducted experiments to thoroughly compare the effectiveness of our method to several baselines. Quantitative evaluation shows that our model is able to generate drums sequences that have similar statistical distributions and characteristics to the training corpus. These features include syncopation, compression ratio, and symmetry among others. We also verified, through a listening test, that generated drum sequences sound pleasant, natural and coherent while they "groove" with the given accompaniment.

* Accepted for the 11th International Conference on Artificial Intelligence in Music, Sound, Art and Design (EvoMUSART), 2022

Via

Access Paper or Ask Questions

Predicting emotion from music videos: exploring the relative contribution of visual and auditory information to affective responses

Feb 19, 2022

Phoebe Chua, Dimos Makris, Dorien Herremans, Gemma Roig, Kat Agres

Figure 1 for Predicting emotion from music videos: exploring the relative contribution of visual and auditory information to affective responses

Figure 2 for Predicting emotion from music videos: exploring the relative contribution of visual and auditory information to affective responses

Figure 3 for Predicting emotion from music videos: exploring the relative contribution of visual and auditory information to affective responses

Figure 4 for Predicting emotion from music videos: exploring the relative contribution of visual and auditory information to affective responses

Abstract:Although media content is increasingly produced, distributed, and consumed in multiple combinations of modalities, how individual modalities contribute to the perceived emotion of a media item remains poorly understood. In this paper we present MusicVideos (MuVi), a novel dataset for affective multimedia content analysis to study how the auditory and visual modalities contribute to the perceived emotion of media. The data were collected by presenting music videos to participants in three conditions: music, visual, and audiovisual. Participants annotated the music videos for valence and arousal over time, as well as the overall emotion conveyed. We present detailed descriptive statistics for key measures in the dataset and the results of feature importance analyses for each condition. Finally, we propose a novel transfer learning architecture to train Predictive models Augmented with Isolated modality Ratings (PAIR) and demonstrate the potential of isolated modality ratings for enhancing multimodal emotion recognition. Our results suggest that perceptions of arousal are influenced primarily by auditory information, while perceptions of valence are more subjective and can be influenced by both visual and auditory information. The dataset is made publicly available.

* 16 pages with 9 figures

Via

Access Paper or Ask Questions

Generating Lead Sheets with Affect: A Novel Conditional seq2seq Framework

Apr 27, 2021

Dimos Makris, Kat R. Agres, Dorien Herremans

Figure 1 for Generating Lead Sheets with Affect: A Novel Conditional seq2seq Framework

Figure 2 for Generating Lead Sheets with Affect: A Novel Conditional seq2seq Framework

Figure 3 for Generating Lead Sheets with Affect: A Novel Conditional seq2seq Framework

Figure 4 for Generating Lead Sheets with Affect: A Novel Conditional seq2seq Framework

Abstract:The field of automatic music composition has seen great progress in the last few years, much of which can be attributed to advances in deep neural networks. There are numerous studies that present different strategies for generating sheet music from scratch. The inclusion of high-level musical characteristics (e.g., perceived emotional qualities), however, as conditions for controlling the generation output remains a challenge. In this paper, we present a novel approach for calculating the valence (the positivity or negativity of the perceived emotion) of a chord progression within a lead sheet, using pre-defined mood tags proposed by music experts. Based on this approach, we propose a novel strategy for conditional lead sheet generation that allows us to steer the music generation in terms of valence, phrasing, and time signature. Our approach is similar to a Neural Machine Translation (NMT) problem, as we include high-level conditions in the encoder part of the sequence-to-sequence architectures used (i.e., long-short term memory networks, and a Transformer network). We conducted experiments to thoroughly analyze these two architectures. The results show that the proposed strategy is able to generate lead sheets in a controllable manner, resulting in distributions of musical attributes similar to those of the training dataset. We also verified through a subjective listening test that our approach is effective in controlling the valence of a generated chord progression.

* Accepted for the International Joint Conference on Neural Networks (IJCNN), Shenzhen, China, 18-22 July 2021 (virtual)

Via

Access Paper or Ask Questions

DeepDrum: An Adaptive Conditional Neural Network

Sep 17, 2018

Dimos Makris, Maximos Kaliakatsos-Papakostas, Katia Lida Kermanidis

Figure 1 for DeepDrum: An Adaptive Conditional Neural Network

Figure 2 for DeepDrum: An Adaptive Conditional Neural Network

Abstract:Considering music as a sequence of events with multiple complex dependencies, the Long Short-Term Memory (LSTM) architecture has proven very efficient in learning and reproducing musical styles. However, the generation of rhythms requires additional information regarding musical structure and accompanying instruments. In this paper we present DeepDrum, an adaptive Neural Network capable of generating drum rhythms under constraints imposed by Feed-Forward (Conditional) Layers which contain musical parameters along with given instrumentation information (e.g. bass and guitar notes). Results on generated drum sequences are presented indicating that DeepDrum is effective in producing rhythms that resemble the learned style, while at the same time conforming to given constraints that were unknown during the training process.

* 2018 Joint Workshop on Machine Learning for Music. The Federated Artificial Intelligence Meeting (FAIM), a joint workshop program of ICML, IJCAI/ECAI, and AAMAS

Via

Access Paper or Ask Questions