Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Stefan Bilbao

Resampling Filter Design for Multirate Neural Audio Effect Processing

Jan 30, 2025

Alistair Carson, Vesa Välimäki, Alec Wright, Stefan Bilbao

Abstract:Neural networks have become ubiquitous in audio effects modelling, especially for guitar amplifiers and distortion pedals. One limitation of such models is that the sample rate of the training data is implicitly encoded in the model weights and therefore not readily adjustable at inference. Recent work explored modifications to recurrent neural network architecture to approximate a sample rate independent system, enabling audio processing at a rate that differs from the original training rate. This method works well for integer oversampling and can reduce aliasing caused by nonlinear activation functions. For small fractional changes in sample rate, fractional delay filters can be used to approximate sample rate independence, but in some cases this method fails entirely. Here, we explore the use of signal resampling at the input and output of the neural network as an alternative solution. We investigate several resampling filter designs and show that a two-stage design consisting of a half-band IIR filter cascaded with a Kaiser window FIR filter can give similar or better results to the previously proposed model adjustment method with many fewer operations per sample and less than one millisecond of latency at typical audio rates. Furthermore, we investigate interpolation and decimation filters for the task of integer oversampling and show that cascaded half-band IIR and FIR designs can be used in conjunction with the model adjustment method to reduce aliasing in a range of distortion effect models.

* Preprint

Via

Access Paper or Ask Questions

Interpolation filter design for sample rate independent audio effect RNNs

Sep 24, 2024

Alistair Carson, Alec Wright, Stefan Bilbao

Figure 1 for Interpolation filter design for sample rate independent audio effect RNNs

Figure 2 for Interpolation filter design for sample rate independent audio effect RNNs

Figure 3 for Interpolation filter design for sample rate independent audio effect RNNs

Figure 4 for Interpolation filter design for sample rate independent audio effect RNNs

Abstract:Recurrent neural networks (RNNs) are effective at emulating the non-linear, stateful behavior of analog guitar amplifiers and distortion effects. Unlike the case of direct circuit simulation, RNNs have a fixed sample rate encoded in their model weights, making the sample rate non-adjustable during inference. Recent work has proposed increasing the sample rate of RNNs at inference (oversampling) by increasing the feedback delay length in samples, using a fractional delay filter for non-integer conversions. Here, we investigate the task of lowering the sample rate at inference (undersampling), and propose using an extrapolation filter to approximate the required fractional signal advance. We consider two filter design methods and analyze the impact of filter order on audio quality. Our results show that the correct choice of filter can give high quality results for both oversampling and undersampling; however, in some cases the sample rate adjustment leads to unwanted artefacts in the output signal. We analyse these failure cases through linearised stability analysis, showing that they result from instability around a fixed point. This approach enables an informed prediction of suitable interpolation filters for a given RNN model before runtime.

Via

Access Paper or Ask Questions

Sample Rate Independent Recurrent Neural Networks for Audio Effects Processing

Jun 10, 2024

Alistair Carson, Alec Wright, Jatin Chowdhury, Vesa Välimäki, Stefan Bilbao

Figure 1 for Sample Rate Independent Recurrent Neural Networks for Audio Effects Processing

Figure 2 for Sample Rate Independent Recurrent Neural Networks for Audio Effects Processing

Figure 3 for Sample Rate Independent Recurrent Neural Networks for Audio Effects Processing

Figure 4 for Sample Rate Independent Recurrent Neural Networks for Audio Effects Processing

Abstract:In recent years, machine learning approaches to modelling guitar amplifiers and effects pedals have been widely investigated and have become standard practice in some consumer products. In particular, recurrent neural networks (RNNs) are a popular choice for modelling non-linear devices such as vacuum tube amplifiers and distortion circuitry. One limitation of such models is that they are trained on audio at a specific sample rate and therefore give unreliable results when operating at another rate. Here, we investigate several methods of modifying RNN structures to make them approximately sample rate independent, with a focus on oversampling. In the case of integer oversampling, we demonstrate that a previously proposed delay-based approach provides high fidelity sample rate conversion whilst additionally reducing aliasing. For non-integer sample rate adjustment, we propose two novel methods and show that one of these, based on cubic Lagrange interpolation of a delay-line, provides a significant improvement over existing methods. To our knowledge, this work provides the first in-depth study into this problem.

* Accepted for publication in Proc. DAFx24, Guildford, UK, September 2024

Via

Access Paper or Ask Questions

Differentiable All-pole Filters for Time-varying Audio Systems

Apr 12, 2024

Chin-Yun Yu, Christopher Mitcheltree, Alistair Carson, Stefan Bilbao, Joshua D. Reiss, György Fazekas

Abstract:Infinite impulse response filters are an essential building block of many time-varying audio systems, such as audio effects and synthesisers. However, their recursive structure impedes end-to-end training of these systems using automatic differentiation. Although non-recursive filter approximations like frequency sampling and frame-based processing have been proposed and widely used in previous works, they cannot accurately reflect the gradient of the original system. We alleviate this difficulty by re-expressing a time-varying all-pole filter to backpropagate the gradients through itself, so the filter implementation is not bound to the technical limitations of automatic differentiation frameworks. This implementation can be employed within any audio system containing filters with poles for efficient gradient evaluation. We demonstrate its training efficiency and expressive capabilities for modelling real-world dynamic audio systems on a phaser, time-varying subtractive synthesiser, and feed-forward compressor. We make our code available and provide the trained audio effect and synth models in a VST plugin at https://christhetree.github.io/all_pole_filters/.

* Submitted to DAFx 2024

Via

Access Paper or Ask Questions

Differentiable Grey-box Modelling of Phaser Effects using Frame-based Spectral Processing

Jun 02, 2023

Alistair Carson, Cassia Valentini-Botinhao, Simon King, Stefan Bilbao

Figure 1 for Differentiable Grey-box Modelling of Phaser Effects using Frame-based Spectral Processing

Figure 2 for Differentiable Grey-box Modelling of Phaser Effects using Frame-based Spectral Processing

Figure 3 for Differentiable Grey-box Modelling of Phaser Effects using Frame-based Spectral Processing

Figure 4 for Differentiable Grey-box Modelling of Phaser Effects using Frame-based Spectral Processing

Abstract:Machine learning approaches to modelling analog audio effects have seen intensive investigation in recent years, particularly in the context of non-linear time-invariant effects such as guitar amplifiers. For modulation effects such as phasers, however, new challenges emerge due to the presence of the low-frequency oscillator which controls the slowly time-varying nature of the effect. Existing approaches have either required foreknowledge of this control signal, or have been non-causal in implementation. This work presents a differentiable digital signal processing approach to modelling phaser effects in which the underlying control signal and time-varying spectral response of the effect are jointly learned. The proposed model processes audio in short frames to implement a time-varying filter in the frequency domain, with a transfer function based on typical analog phaser circuit topology. We show that the model can be trained to emulate an analog reference device, while retaining interpretable and adjustable parameters. The frame duration is an important hyper-parameter of the proposed model, so an investigation was carried out into its effect on model accuracy. The optimal frame length depends on both the rate and transient decay-time of the target effect, but the frame length can be altered at inference time without a significant change in accuracy.

* Accepted for publication in Proc. DAFx23, Copenhagen, Denmark, September 2023

Via

Access Paper or Ask Questions