Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Florian Grötschla

Parametric Neural Amp Modeling with Active Learning

Jul 02, 2025

Florian Grötschla, Luca A. Lanzendörfer, Longxiang Jiao, Roger Wattenhofer

Abstract:We introduce PANAMA, an active learning framework for the training of end-to-end parametric guitar amp models using a WaveNet-like architecture. With \model, one can create a virtual amp by recording samples that are determined by an active learning strategy to use a minimum amount of datapoints (i.e., amp knob settings). We show that gradient-based optimization algorithms can be used to determine the optimal datapoints to sample, and that the approach helps under a constrained number of samples.

* Accepted at ISMIR 2025 as Late-Breaking Demo (LBD)

Via

Access Paper or Ask Questions

Benchmarking Music Generation Models and Metrics via Human Preference Studies

Jun 23, 2025

Florian Grötschla, Ahmet Solak, Luca A. Lanzendörfer, Roger Wattenhofer

Abstract:Recent advancements have brought generated music closer to human-created compositions, yet evaluating these models remains challenging. While human preference is the gold standard for assessing quality, translating these subjective judgments into objective metrics, particularly for text-audio alignment and music quality, has proven difficult. In this work, we generate 6k songs using 12 state-of-the-art models and conduct a survey of 15k pairwise audio comparisons with 2.5k human participants to evaluate the correlation between human preferences and widely used metrics. To the best of our knowledge, this work is the first to rank current state-of-the-art music generation models and metrics based on human preference. To further the field of subjective metric evaluation, we provide open access to our dataset of generated music and human evaluations.

* In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2025
* Accepted at ICASSP 2025

Via

Access Paper or Ask Questions

High-Fidelity Music Vocoder using Neural Audio Codecs

Feb 18, 2025

Luca A. Lanzendörfer, Florian Grötschla, Michael Ungersböck, Roger Wattenhofer

Abstract:While neural vocoders have made significant progress in high-fidelity speech synthesis, their application on polyphonic music has remained underexplored. In this work, we propose DisCoder, a neural vocoder that leverages a generative adversarial encoder-decoder architecture informed by a neural audio codec to reconstruct high-fidelity 44.1 kHz audio from mel spectrograms. Our approach first transforms the mel spectrogram into a lower-dimensional representation aligned with the Descript Audio Codec (DAC) latent space before reconstructing it to an audio signal using a fine-tuned DAC decoder. DisCoder achieves state-of-the-art performance in music synthesis on several objective metrics and in a MUSHRA listening study. Our approach also shows competitive performance in speech synthesis, highlighting its potential as a universal vocoder.

* Accepted at ICASSP 2025

Via

Access Paper or Ask Questions

Audio Atlas: Visualizing and Exploring Audio Datasets

Nov 30, 2024

Luca A. Lanzendörfer, Florian Grötschla, Uzeyir Valizada, Roger Wattenhofer

Figure 1 for Audio Atlas: Visualizing and Exploring Audio Datasets

Figure 2 for Audio Atlas: Visualizing and Exploring Audio Datasets

Figure 3 for Audio Atlas: Visualizing and Exploring Audio Datasets

Abstract:We introduce Audio Atlas, an interactive web application for visualizing audio data using text-audio embeddings. Audio Atlas is designed to facilitate the exploration and analysis of audio datasets using a contrastive embedding model and a vector database for efficient data management and semantic search. The system maps audio embeddings into a two-dimensional space and leverages DeepScatter for dynamic visualization. Designed for extensibility, Audio Atlas allows easy integration of new datasets, enabling users to better understand their audio data and identify both patterns and outliers. We open-source the codebase of Audio Atlas, and provide an initial implementation containing various audio and music datasets.

* Extended Abstract at ISMIR 2024

Via

Access Paper or Ask Questions

Benchmarking Positional Encodings for GNNs and Graph Transformers

Nov 19, 2024

Florian Grötschla, Jiaqing Xie, Roger Wattenhofer

Abstract:Recent advances in Graph Neural Networks (GNNs) and Graph Transformers (GTs) have been driven by innovations in architectures and Positional Encodings (PEs), which are critical for augmenting node features and capturing graph topology. PEs are essential for GTs, where topological information would otherwise be lost without message-passing. However, PEs are often tested alongside novel architectures, making it difficult to isolate their effect on established models. To address this, we present a comprehensive benchmark of PEs in a unified framework that includes both message-passing GNNs and GTs. We also establish theoretical connections between MPNNs and GTs and introduce a sparsified GRIT attention mechanism to examine the influence of global connectivity. Our findings demonstrate that previously untested combinations of GNN architectures and PEs can outperform existing methods and offer a more comprehensive picture of the state-of-the-art. To support future research and experimentation in our framework, we make the code publicly available.

Via

Access Paper or Ask Questions

SNAC: Multi-Scale Neural Audio Codec

Oct 18, 2024

Hubert Siuzdak, Florian Grötschla, Luca A. Lanzendörfer

Figure 1 for SNAC: Multi-Scale Neural Audio Codec

Figure 2 for SNAC: Multi-Scale Neural Audio Codec

Figure 3 for SNAC: Multi-Scale Neural Audio Codec

Figure 4 for SNAC: Multi-Scale Neural Audio Codec

Abstract:Neural audio codecs have recently gained popularity because they can represent audio signals with high fidelity at very low bitrates, making it feasible to use language modeling approaches for audio generation and understanding. Residual Vector Quantization (RVQ) has become the standard technique for neural audio compression using a cascade of VQ codebooks. This paper proposes the Multi-Scale Neural Audio Codec, a simple extension of RVQ where the quantizers can operate at different temporal resolutions. By applying a hierarchy of quantizers at variable frame rates, the codec adapts to the audio structure across multiple timescales. This leads to more efficient compression, as demonstrated by extensive objective and subjective evaluations. The code and model weights are open-sourced at https://github.com/hubertsiuzdak/snac.

Via

Access Paper or Ask Questions

Towards Leveraging Contrastively Pretrained Neural Audio Embeddings for Recommender Tasks

Sep 13, 2024

Florian Grötschla, Luca Strässle, Luca A. Lanzendörfer, Roger Wattenhofer

Figure 1 for Towards Leveraging Contrastively Pretrained Neural Audio Embeddings for Recommender Tasks

Abstract:Music recommender systems frequently utilize network-based models to capture relationships between music pieces, artists, and users. Although these relationships provide valuable insights for predictions, new music pieces or artists often face the cold-start problem due to insufficient initial information. To address this, one can extract content-based information directly from the music to enhance collaborative-filtering-based methods. While previous approaches have relied on hand-crafted audio features for this purpose, we explore the use of contrastively pretrained neural audio embedding models, which offer a richer and more nuanced representation of music. Our experiments demonstrate that neural embeddings, particularly those generated with the Contrastive Language-Audio Pretraining (CLAP) model, present a promising approach to enhancing music recommendation tasks within graph-based frameworks.

* Accepted at the 2nd Music Recommender Workshop (@RecSys)

Via

Access Paper or Ask Questions

GraphFSA: A Finite State Automaton Framework for Algorithmic Learning on Graphs

Aug 20, 2024

Florian Grötschla, Joël Mathys, Christoffer Raun, Roger Wattenhofer

Figure 1 for GraphFSA: A Finite State Automaton Framework for Algorithmic Learning on Graphs

Figure 2 for GraphFSA: A Finite State Automaton Framework for Algorithmic Learning on Graphs

Figure 3 for GraphFSA: A Finite State Automaton Framework for Algorithmic Learning on Graphs

Figure 4 for GraphFSA: A Finite State Automaton Framework for Algorithmic Learning on Graphs

Abstract:Many graph algorithms can be viewed as sets of rules that are iteratively applied, with the number of iterations dependent on the size and complexity of the input graph. Existing machine learning architectures often struggle to represent these algorithmic decisions as discrete state transitions. Therefore, we propose a novel framework: GraphFSA (Graph Finite State Automaton). GraphFSA is designed to learn a finite state automaton that runs on each node of a given graph. We test GraphFSA on cellular automata problems, showcasing its abilities in a straightforward algorithmic setting. For a comprehensive empirical evaluation of our framework, we create a diverse range of synthetic problems. As our main application, we then focus on learning more elaborate graph algorithms. Our findings suggest that GraphFSA exhibits strong generalization and extrapolation abilities, presenting an alternative approach to represent these algorithms.

* Published as a conference paper at ECAI 2024

Via

Access Paper or Ask Questions

AEye: A Visualization Tool for Image Datasets

Aug 07, 2024

Florian Grötschla, Luca A. Lanzendörfer, Marco Calzavara, Roger Wattenhofer

Figure 1 for AEye: A Visualization Tool for Image Datasets

Figure 2 for AEye: A Visualization Tool for Image Datasets

Figure 3 for AEye: A Visualization Tool for Image Datasets

Figure 4 for AEye: A Visualization Tool for Image Datasets

Abstract:Image datasets serve as the foundation for machine learning models in computer vision, significantly influencing model capabilities, performance, and biases alongside architectural considerations. Therefore, understanding the composition and distribution of these datasets has become increasingly crucial. To address the need for intuitive exploration of these datasets, we propose AEye, an extensible and scalable visualization tool tailored to image datasets. AEye utilizes a contrastively trained model to embed images into semantically meaningful high-dimensional representations, facilitating data clustering and organization. To visualize the high-dimensional representations, we project them onto a two-dimensional plane and arrange images in layers so users can seamlessly navigate and explore them interactively. AEye facilitates semantic search functionalities for both text and image queries, enabling users to search for content. We open-source the codebase for AEye, and provide a simple configuration to add datasets.

* Accepted at IEEE VIS 2024

Via

Access Paper or Ask Questions

Benchmarking GNNs Using Lightning Network Data

Jul 05, 2024

Rainer Feichtinger, Florian Grötschla, Lioba Heimbach, Roger Wattenhofer

Figure 1 for Benchmarking GNNs Using Lightning Network Data

Figure 2 for Benchmarking GNNs Using Lightning Network Data

Figure 3 for Benchmarking GNNs Using Lightning Network Data

Figure 4 for Benchmarking GNNs Using Lightning Network Data

Abstract:The Bitcoin Lightning Network is a layer 2 protocol designed to facilitate fast and inexpensive Bitcoin transactions. It operates by establishing channels between users, where Bitcoin is locked and transactions are conducted off-chain until the channels are closed, with only the initial and final transactions recorded on the blockchain. Routing transactions through intermediary nodes is crucial for users without direct channels, allowing these routing nodes to collect fees for their services. Nodes announce their channels to the network, forming a graph with channels as edges. In this paper, we analyze the graph structure of the Lightning Network and investigate the statistical relationships between node properties using machine learning, particularly Graph Neural Networks (GNNs). We formulate a series of tasks to explore these relationships and provide benchmarks for GNN architectures, demonstrating how topological and neighbor information enhances performance. Our evaluation of several models reveals the effectiveness of GNNs in these tasks and highlights the insights gained from their application.

Via

Access Paper or Ask Questions