Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Julian D. Parker

Stable Audio Open

Jul 19, 2024

Zach Evans, Julian D. Parker, CJ Carr, Zack Zukowski, Josiah Taylor, Jordi Pons

Abstract:Open generative models are vitally important for the community, allowing for fine-tunes and serving as baselines when presenting new models. However, most current text-to-audio models are private and not accessible for artists and researchers to build upon. Here we describe the architecture and training process of a new open-weights text-to-audio model trained with Creative Commons data. Our evaluation shows that the model's performance is competitive with the state-of-the-art across various metrics. Notably, the reported FDopenl3 results (measuring the realism of the generations) showcase its potential for high-quality stereo sound synthesis at 44.1kHz.

* Demo: https://stability-ai.github.io/stable-audio-open-demo/ Weights: https://huggingface.co/stabilityai/stable-audio-open-1.0 Code: https://github.com/Stability-AI/stable-audio-tools. arXiv admin note: text overlap with arXiv:2404.10301

Via

Access Paper or Ask Questions

Long-form music generation with latent diffusion

Apr 16, 2024

Zach Evans, Julian D. Parker, CJ Carr, Zack Zukowski, Josiah Taylor, Jordi Pons

Abstract:Audio-based generative models for music have seen great strides recently, but so far have not managed to produce full-length music tracks with coherent musical structure. We show that by training a generative model on long temporal contexts it is possible to produce long-form music of up to 4m45s. Our model consists of a diffusion-transformer operating on a highly downsampled continuous latent representation (latent rate of 21.5Hz). It obtains state-of-the-art generations according to metrics on audio quality and prompt alignment, and subjective tests reveal that it produces full-length music with coherent structure.

Via

Access Paper or Ask Questions

StemGen: A music generation model that listens

Dec 14, 2023

Julian D. Parker, Janne Spijkervet, Katerina Kosta, Furkan Yesiler, Boris Kuznetsov, Ju-Chiang Wang, Matt Avent, Jitong Chen, Duc Le

Figure 1 for StemGen: A music generation model that listens

Figure 2 for StemGen: A music generation model that listens

Abstract:End-to-end generation of musical audio using deep learning techniques has seen an explosion of activity recently. However, most models concentrate on generating fully mixed music in response to abstract conditioning information. In this work, we present an alternative paradigm for producing music generation models that can listen and respond to musical context. We describe how such a model can be constructed using a non-autoregressive, transformer-based model architecture and present a number of novel architectural and sampling improvements. We train the described architecture on both an open-source and a proprietary dataset. We evaluate the produced models using standard quality metrics and a new approach based on music information retrieval descriptors. The resulting model reaches the audio quality of state-of-the-art text-conditioned models, as well as exhibiting strong musical coherence with its context.

* Accepted for publication at ICASSP 2024

Via

Access Paper or Ask Questions

Differentiable Allpass Filters for Phase Response Estimation and Automatic Signal Alignment

Jun 02, 2023

Anders R. Bargum, Stefania Serafin, Cumhur Erkut, Julian D. Parker

Figure 1 for Differentiable Allpass Filters for Phase Response Estimation and Automatic Signal Alignment

Figure 2 for Differentiable Allpass Filters for Phase Response Estimation and Automatic Signal Alignment

Figure 3 for Differentiable Allpass Filters for Phase Response Estimation and Automatic Signal Alignment

Figure 4 for Differentiable Allpass Filters for Phase Response Estimation and Automatic Signal Alignment

Abstract:Virtual analog (VA) audio effects are increasingly based on neural networks and deep learning frameworks. Due to the underlying black-box methodology, a successful model will learn to approximate the data it is presented, including potential errors such as latency and audio dropouts as well as non-linear characteristics and frequency-dependent phase shifts produced by the hardware. The latter is of particular interest as the learned phase-response might cause unwanted audible artifacts when the effect is used for creative processing techniques such as dry-wet mixing or parallel compression. To overcome these artifacts we propose differentiable signal processing tools and deep optimization structures for automatically tuning all-pass filters to predict the phase response of different VA simulations, and align processed signals that are out of phase. The approaches are assessed using objective metrics while listening tests evaluate their ability to enhance the quality of parallel path processing techniques. Ultimately, an over-parameterized, BiasNet-based, all-pass model is proposed for the optimization problem under consideration, resulting in models that can estimate all-pass filter coefficients to align a dry signal with its affected, wet, equivalent.

* Collaboration done while interning/employed at Native Instruments. Accepted for publication in Proc. DAFX'23, Copenhagen, Denmark, September 2023. Sound examples at https://abargum.github.io v2: 10 pages, LaTeX; figures resized, pdf optimized

Via

Access Paper or Ask Questions

Physical Modeling using Recurrent Neural Networks with Fast Convolutional Layers

Apr 21, 2022

Julian D. Parker, Sebastian J. Schlecht, Rudolf Rabenstein, Maximilian Schäfer

Figure 1 for Physical Modeling using Recurrent Neural Networks with Fast Convolutional Layers

Figure 2 for Physical Modeling using Recurrent Neural Networks with Fast Convolutional Layers

Figure 3 for Physical Modeling using Recurrent Neural Networks with Fast Convolutional Layers

Figure 4 for Physical Modeling using Recurrent Neural Networks with Fast Convolutional Layers

Abstract:Discrete-time modeling of acoustic, mechanical and electrical systems is a prominent topic in the musical signal processing literature. Such models are mostly derived by discretizing a mathematical model, given in terms of ordinary or partial differential equations, using established techniques. Recent work has applied the techniques of machine-learning to construct such models automatically from data for the case of systems which have lumped states described by scalar values, such as electrical circuits. In this work, we examine how similar techniques are able to construct models of systems which have spatially distributed rather than lumped states. We describe several novel recurrent neural network structures, and show how they can be thought of as an extension of modal techniques. As a proof of concept, we generate synthetic data for three physical systems and show that the proposed network structures can be trained with this data to reproduce the behavior of these systems.

* Submitted to DAFx2022

Via

Access Paper or Ask Questions