Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Andrew McPherson

Designing Neural Synthesizers for Low Latency Interaction

Mar 14, 2025

Franco Caspe, Jordie Shier, Mark Sandler, Charalampos Saitis, Andrew McPherson

Abstract:Neural Audio Synthesis (NAS) models offer interactive musical control over high-quality, expressive audio generators. While these models can operate in real-time, they often suffer from high latency, making them unsuitable for intimate musical interaction. The impact of architectural choices in deep learning models on audio latency remains largely unexplored in the NAS literature. In this work, we investigate the sources of latency and jitter typically found in interactive NAS models. We then apply this analysis to the task of timbre transfer using RAVE, a convolutional variational autoencoder for audio waveforms introduced by Caillon et al. in 2021. Finally, we present an iterative design approach for optimizing latency. This culminates with a model we call BRAVE (Bravely Realtime Audio Variational autoEncoder), which is low-latency and exhibits better pitch and loudness replication while showing timbre modification capabilities similar to RAVE. We implement it in a specialized inference framework for low-latency, real-time inference and present a proof-of-concept audio plugin compatible with audio signals from musical instruments. We expect the challenges and guidelines described in this document to support NAS researchers in designing models for low-latency inference from the ground up, enriching the landscape of possibilities for musicians.

* See website at fcaspe.github.io/brave - 13 pages, 5 figures, accepted to the Journal of the Audio Engineering Society

Via

Access Paper or Ask Questions

Real-time Timbre Remapping with Differentiable DSP

Jul 05, 2024

Jordie Shier, Charalampos Saitis, Andrew Robertson, Andrew McPherson

Abstract:Timbre is a primary mode of expression in diverse musical contexts. However, prevalent audio-driven synthesis methods predominantly rely on pitch and loudness envelopes, effectively flattening timbral expression from the input. Our approach draws on the concept of timbre analogies and investigates how timbral expression from an input signal can be mapped onto controls for a synthesizer. Leveraging differentiable digital signal processing, our method facilitates direct optimization of synthesizer parameters through a novel feature difference loss. This loss function, designed to learn relative timbral differences between musical events, prioritizes the subtleties of graded timbre modulations within phrases, allowing for meaningful translations in a timbre space. Using snare drum performances as a case study, where timbral expression is central, we demonstrate real-time timbre remapping from acoustic snare drums to a differentiable synthesizer modeled after the Roland TR-808.

* Accepted for publication at the 24th International Conference on New Interfaces for Musical Expression in Utrecht, Netherlands

Via

Access Paper or Ask Questions

FM Tone Transfer with Envelope Learning

Oct 07, 2023

Franco Caspe, Andrew McPherson, Mark Sandler

Abstract:Tone Transfer is a novel deep-learning technique for interfacing a sound source with a synthesizer, transforming the timbre of audio excerpts while keeping their musical form content. Due to its good audio quality results and continuous controllability, it has been recently applied in several audio processing tools. Nevertheless, it still presents several shortcomings related to poor sound diversity, and limited transient and dynamic rendering, which we believe hinder its possibilities of articulation and phrasing in a real-time performance context. In this work, we present a discussion on current Tone Transfer architectures for the task of controlling synthetic audio with musical instruments and discuss their challenges in allowing expressive performances. Next, we introduce Envelope Learning, a novel method for designing Tone Transfer architectures that map musical events using a training objective at the synthesis parameter level. Our technique can render note beginnings and endings accurately and for a variety of sounds; these are essential steps for improving musical articulation, phrasing, and sound diversity with Tone Transfer. Finally, we implement a VST plugin for real-time live use and discuss possibilities for improvement.

* Accepted to Audio Mostly 2023

Via

Access Paper or Ask Questions

Differentiable Modelling of Percussive Audio with Transient and Spectral Synthesis

Sep 13, 2023

Jordie Shier, Franco Caspe, Andrew Robertson, Mark Sandler, Charalampos Saitis, Andrew McPherson

Figure 1 for Differentiable Modelling of Percussive Audio with Transient and Spectral Synthesis

Figure 2 for Differentiable Modelling of Percussive Audio with Transient and Spectral Synthesis

Figure 3 for Differentiable Modelling of Percussive Audio with Transient and Spectral Synthesis

Figure 4 for Differentiable Modelling of Percussive Audio with Transient and Spectral Synthesis

Abstract:Differentiable digital signal processing (DDSP) techniques, including methods for audio synthesis, have gained attention in recent years and lend themselves to interpretability in the parameter space. However, current differentiable synthesis methods have not explicitly sought to model the transient portion of signals, which is important for percussive sounds. In this work, we present a unified synthesis framework aiming to address transient generation and percussive synthesis within a DDSP framework. To this end, we propose a model for percussive synthesis that builds on sinusoidal modeling synthesis and incorporates a modulated temporal convolutional network for transient generation. We use a modified sinusoidal peak picking algorithm to generate time-varying non-harmonic sinusoids and pair it with differentiable noise and transient encoders that are jointly trained to reconstruct drumset sounds. We compute a set of reconstruction metrics using a large dataset of acoustic and electronic percussion samples that show that our method leads to improved onset signal reconstruction for membranophone percussion instruments.

* To be published in The Proceedings of Forum Acusticum, Sep 2023, Turin, Italy

Via

Access Paper or Ask Questions

A Review of Differentiable Digital Signal Processing for Music & Speech Synthesis

Aug 29, 2023

Ben Hayes, Jordie Shier, György Fazekas, Andrew McPherson, Charalampos Saitis

Abstract:The term "differentiable digital signal processing" describes a family of techniques in which loss function gradients are backpropagated through digital signal processors, facilitating their integration into neural networks. This article surveys the literature on differentiable audio signal processing, focusing on its use in music & speech synthesis. We catalogue applications to tasks including music performance rendering, sound matching, and voice transformation, discussing the motivations for and implications of the use of this methodology. This is accompanied by an overview of digital signal processing operations that have been implemented differentiably. Finally, we highlight open challenges, including optimisation pathologies, robustness to real-world conditions, and design trade-offs, and discuss directions for future research.

* Under review for Frontiers in Signal Processing

Via

Access Paper or Ask Questions

Pipeline for recording datasets and running neural networks on the Bela embedded hardware platform

Jun 20, 2023

Teresa Pelinski, Rodrigo Diaz, Adán L. Benito Temprano, Andrew McPherson

Figure 1 for Pipeline for recording datasets and running neural networks on the Bela embedded hardware platform

Figure 2 for Pipeline for recording datasets and running neural networks on the Bela embedded hardware platform

Figure 3 for Pipeline for recording datasets and running neural networks on the Bela embedded hardware platform

Figure 4 for Pipeline for recording datasets and running neural networks on the Bela embedded hardware platform

Abstract:Deploying deep learning models on embedded devices is an arduous task: oftentimes, there exist no platform-specific instructions, and compilation times can be considerably large due to the limited computational resources available on-device. Moreover, many music-making applications demand real-time inference. Embedded hardware platforms for audio, such as Bela, offer an entry point for beginners into physical audio computing; however, the need for cross-compilation environments and low-level software development tools for deploying embedded deep learning models imposes high entry barriers on non-expert users. We present a pipeline for deploying neural networks in the Bela embedded hardware platform. In our pipeline, we include a tool to record a multichannel dataset of sensor signals. Additionally, we provide a dockerised cross-compilation environment for faster compilation. With this pipeline, we aim to provide a template for programmers and makers to prototype and experiment with neural networks for real-time embedded musical applications.

Via

Access Paper or Ask Questions

DDX7: Differentiable FM Synthesis of Musical Instrument Sounds

Aug 12, 2022

Franco Caspe, Andrew McPherson, Mark Sandler

Figure 1 for DDX7: Differentiable FM Synthesis of Musical Instrument Sounds

Figure 2 for DDX7: Differentiable FM Synthesis of Musical Instrument Sounds

Figure 3 for DDX7: Differentiable FM Synthesis of Musical Instrument Sounds

Figure 4 for DDX7: Differentiable FM Synthesis of Musical Instrument Sounds

Abstract:FM Synthesis is a well-known algorithm used to generate complex timbre from a compact set of design primitives. Typically featuring a MIDI interface, it is usually impractical to control it from an audio source. On the other hand, Differentiable Digital Signal Processing (DDSP) has enabled nuanced audio rendering by Deep Neural Networks (DNNs) that learn to control differentiable synthesis layers from arbitrary sound inputs. The training process involves a corpus of audio for supervision, and spectral reconstruction loss functions. Such functions, while being great to match spectral amplitudes, present a lack of pitch direction which can hinder the joint optimization of the parameters of FM synthesizers. In this paper, we take steps towards enabling continuous control of a well-established FM synthesis architecture from an audio input. Firstly, we discuss a set of design constraints that ease spectral optimization of a differentiable FM synthesizer via a standard reconstruction loss. Next, we present Differentiable DX7 (DDX7), a lightweight architecture for neural FM resynthesis of musical instrument sounds in terms of a compact set of parameters. We train the model on instrument samples extracted from the URMP dataset, and quantitatively demonstrate its comparable audio quality against selected benchmarks.

* Accepted to ISMIR 2022. See online supplement at https://fcaspe.github.io/ddx7/

Via

Access Paper or Ask Questions

An embedded multichannel sound acquisition system for drone audition

Jan 17, 2021

Michael Clayton, Lin Wang, Andrew McPherson, Andrea Cavallaro

Figure 1 for An embedded multichannel sound acquisition system for drone audition

Figure 2 for An embedded multichannel sound acquisition system for drone audition

Figure 3 for An embedded multichannel sound acquisition system for drone audition

Figure 4 for An embedded multichannel sound acquisition system for drone audition

Abstract:Microphone array techniques can improve the acoustic sensing performance on drones, compared to the use of a single microphone. However, multichannel sound acquisition systems are not available in current commercial drone platforms. To encourage the research in drone audition, we present an embedded sound acquisition and recording system with eight microphones and a multichannel sound recorder mounted on a quadcopter. In addition to recording and storing locally the sound from multiple microphones simultaneously, the embedded system can connect wirelessly to a remote terminal to transfer audio files for further processing. This will be the first stage towards creating a fully embedded solution for drone audition. We present experimental results obtained by state-of-the-art drone audition algorithms applied to the sound recorded by the embedded system.

Via

Access Paper or Ask Questions