Abstract:In recent years, the quality and public interest in music generation systems have grown, encouraging research into various ways to control these systems. We propose a novel method for controlling surprisal in music generation using sequence models. To achieve this goal, we define a metric called Instantaneous Information Content (IIC). The IIC serves as a proxy function for the perceived musical surprisal (as estimated from a probabilistic model) and can be calculated at any point within a music piece. This enables the comparison of surprisal across different musical content even if the musical events occur in irregular time intervals. We use beam search to generate musical material whose IIC curve closely approximates a given target IIC. We experimentally show that the IIC correlates with harmonic and rhythmic complexity and note density. The correlation decreases with the length of the musical context used for estimating the IIC. Finally, we conduct a qualitative user study to test if human listeners can identify the IIC curves that have been used as targets when generating the respective musical material. We provide code for creating IIC interpolations and IIC visualizations on https://github.com/muthissar/iic.
Abstract:Research in natural language processing has demonstrated that the quality of generations from trained autoregressive language models is significantly influenced by the used sampling strategy. In this study, we investigate the impact of different sampling techniques on musical qualities such as diversity and structure. To accomplish this, we train a high-capacity transformer model on a vast collection of highly-structured Irish folk melodies and analyze the musical qualities of the samples generated using distribution truncation sampling techniques. Specifically, we use nucleus sampling, the recently proposed "typical sampling", and conventional ancestral sampling. We evaluate the effect of these sampling strategies in two scenarios: optimal circumstances with a well-calibrated model and suboptimal circumstances where we systematically degrade the model's performance. We assess the generated samples using objective and subjective evaluations. We discover that probability truncation techniques may restrict diversity and structural patterns in optimal circumstances, but may also produce more musical samples in suboptimal circumstances.
Abstract:It has been shown in a recent publication that words in human-produced English language tend to have an information content close to the conditional entropy. In this paper, we show that the same is true for events in human-produced monophonic musical sequences. We also show how "typical sampling" influences the distribution of information around the entropy for single events and sequences.