Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Souraja Kundu

Emotion-Guided Image to Music Generation

Oct 29, 2024

Souraja Kundu, Saket Singh, Yuji Iwahori

Figure 1 for Emotion-Guided Image to Music Generation

Figure 2 for Emotion-Guided Image to Music Generation

Figure 3 for Emotion-Guided Image to Music Generation

Figure 4 for Emotion-Guided Image to Music Generation

Abstract:Generating music from images can enhance various applications, including background music for photo slideshows, social media experiences, and video creation. This paper presents an emotion-guided image-to-music generation framework that leverages the Valence-Arousal (VA) emotional space to produce music that aligns with the emotional tone of a given image. Unlike previous models that rely on contrastive learning for emotional consistency, the proposed approach directly integrates a VA loss function to enable accurate emotional alignment. The model employs a CNN-Transformer architecture, featuring pre-trained CNN image feature extractors and three Transformer encoders to capture complex, high-level emotional features from MIDI music. Three Transformer decoders refine these features to generate musically and emotionally consistent MIDI sequences. Experimental results on a newly curated emotionally paired image-MIDI dataset demonstrate the proposed model's superior performance across metrics such as Polyphony Rate, Pitch Entropy, Groove Consistency, and loss convergence.

* 2024 6th Asian Digital Image Processing Conference

Via

Access Paper or Ask Questions