Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Luca A. Lanzendörfer

Parametric Neural Amp Modeling with Active Learning

Jul 02, 2025

Florian Grötschla, Luca A. Lanzendörfer, Longxiang Jiao, Roger Wattenhofer

Abstract:We introduce PANAMA, an active learning framework for the training of end-to-end parametric guitar amp models using a WaveNet-like architecture. With \model, one can create a virtual amp by recording samples that are determined by an active learning strategy to use a minimum amount of datapoints (i.e., amp knob settings). We show that gradient-based optimization algorithms can be used to determine the optimal datapoints to sample, and that the approach helps under a constrained number of samples.

* Accepted at ISMIR 2025 as Late-Breaking Demo (LBD)

Via

Access Paper or Ask Questions

Benchmarking Music Generation Models and Metrics via Human Preference Studies

Jun 23, 2025

Florian Grötschla, Ahmet Solak, Luca A. Lanzendörfer, Roger Wattenhofer

Abstract:Recent advancements have brought generated music closer to human-created compositions, yet evaluating these models remains challenging. While human preference is the gold standard for assessing quality, translating these subjective judgments into objective metrics, particularly for text-audio alignment and music quality, has proven difficult. In this work, we generate 6k songs using 12 state-of-the-art models and conduct a survey of 15k pairwise audio comparisons with 2.5k human participants to evaluate the correlation between human preferences and widely used metrics. To the best of our knowledge, this work is the first to rank current state-of-the-art music generation models and metrics based on human preference. To further the field of subjective metric evaluation, we provide open access to our dataset of generated music and human evaluations.

* In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2025
* Accepted at ICASSP 2025

Via

Access Paper or Ask Questions

High-Fidelity Music Vocoder using Neural Audio Codecs

Feb 18, 2025

Luca A. Lanzendörfer, Florian Grötschla, Michael Ungersböck, Roger Wattenhofer

Abstract:While neural vocoders have made significant progress in high-fidelity speech synthesis, their application on polyphonic music has remained underexplored. In this work, we propose DisCoder, a neural vocoder that leverages a generative adversarial encoder-decoder architecture informed by a neural audio codec to reconstruct high-fidelity 44.1 kHz audio from mel spectrograms. Our approach first transforms the mel spectrogram into a lower-dimensional representation aligned with the Descript Audio Codec (DAC) latent space before reconstructing it to an audio signal using a fine-tuned DAC decoder. DisCoder achieves state-of-the-art performance in music synthesis on several objective metrics and in a MUSHRA listening study. Our approach also shows competitive performance in speech synthesis, highlighting its potential as a universal vocoder.

* Accepted at ICASSP 2025

Via

Access Paper or Ask Questions

Audio Atlas: Visualizing and Exploring Audio Datasets

Nov 30, 2024

Luca A. Lanzendörfer, Florian Grötschla, Uzeyir Valizada, Roger Wattenhofer

Figure 1 for Audio Atlas: Visualizing and Exploring Audio Datasets

Figure 2 for Audio Atlas: Visualizing and Exploring Audio Datasets

Figure 3 for Audio Atlas: Visualizing and Exploring Audio Datasets

Abstract:We introduce Audio Atlas, an interactive web application for visualizing audio data using text-audio embeddings. Audio Atlas is designed to facilitate the exploration and analysis of audio datasets using a contrastive embedding model and a vector database for efficient data management and semantic search. The system maps audio embeddings into a two-dimensional space and leverages DeepScatter for dynamic visualization. Designed for extensibility, Audio Atlas allows easy integration of new datasets, enabling users to better understand their audio data and identify both patterns and outliers. We open-source the codebase of Audio Atlas, and provide an initial implementation containing various audio and music datasets.

* Extended Abstract at ISMIR 2024

Via

Access Paper or Ask Questions

SNAC: Multi-Scale Neural Audio Codec

Oct 18, 2024

Hubert Siuzdak, Florian Grötschla, Luca A. Lanzendörfer

Figure 1 for SNAC: Multi-Scale Neural Audio Codec

Figure 2 for SNAC: Multi-Scale Neural Audio Codec

Figure 3 for SNAC: Multi-Scale Neural Audio Codec

Figure 4 for SNAC: Multi-Scale Neural Audio Codec

Abstract:Neural audio codecs have recently gained popularity because they can represent audio signals with high fidelity at very low bitrates, making it feasible to use language modeling approaches for audio generation and understanding. Residual Vector Quantization (RVQ) has become the standard technique for neural audio compression using a cascade of VQ codebooks. This paper proposes the Multi-Scale Neural Audio Codec, a simple extension of RVQ where the quantizers can operate at different temporal resolutions. By applying a hierarchy of quantizers at variable frame rates, the codec adapts to the audio structure across multiple timescales. This leads to more efficient compression, as demonstrated by extensive objective and subjective evaluations. The code and model weights are open-sourced at https://github.com/hubertsiuzdak/snac.

Via

Access Paper or Ask Questions

Towards Leveraging Contrastively Pretrained Neural Audio Embeddings for Recommender Tasks

Sep 13, 2024

Florian Grötschla, Luca Strässle, Luca A. Lanzendörfer, Roger Wattenhofer

Figure 1 for Towards Leveraging Contrastively Pretrained Neural Audio Embeddings for Recommender Tasks

Abstract:Music recommender systems frequently utilize network-based models to capture relationships between music pieces, artists, and users. Although these relationships provide valuable insights for predictions, new music pieces or artists often face the cold-start problem due to insufficient initial information. To address this, one can extract content-based information directly from the music to enhance collaborative-filtering-based methods. While previous approaches have relied on hand-crafted audio features for this purpose, we explore the use of contrastively pretrained neural audio embedding models, which offer a richer and more nuanced representation of music. Our experiments demonstrate that neural embeddings, particularly those generated with the Contrastive Language-Audio Pretraining (CLAP) model, present a promising approach to enhancing music recommendation tasks within graph-based frameworks.

* Accepted at the 2nd Music Recommender Workshop (@RecSys)

Via

Access Paper or Ask Questions

AEye: A Visualization Tool for Image Datasets

Aug 07, 2024

Florian Grötschla, Luca A. Lanzendörfer, Marco Calzavara, Roger Wattenhofer

Figure 1 for AEye: A Visualization Tool for Image Datasets

Figure 2 for AEye: A Visualization Tool for Image Datasets

Figure 3 for AEye: A Visualization Tool for Image Datasets

Figure 4 for AEye: A Visualization Tool for Image Datasets

Abstract:Image datasets serve as the foundation for machine learning models in computer vision, significantly influencing model capabilities, performance, and biases alongside architectural considerations. Therefore, understanding the composition and distribution of these datasets has become increasingly crucial. To address the need for intuitive exploration of these datasets, we propose AEye, an extensible and scalable visualization tool tailored to image datasets. AEye utilizes a contrastively trained model to embed images into semantically meaningful high-dimensional representations, facilitating data clustering and organization. To visualize the high-dimensional representations, we project them onto a two-dimensional plane and arrange images in layers so users can seamlessly navigate and explore them interactively. AEye facilitates semantic search functionalities for both text and image queries, enabling users to search for content. We open-source the codebase for AEye, and provide a simple configuration to add datasets.

* Accepted at IEEE VIS 2024

Via

Access Paper or Ask Questions

Cue Point Estimation using Object Detection

Jul 09, 2024

Giulia Argüello, Luca A. Lanzendörfer, Roger Wattenhofer

Figure 1 for Cue Point Estimation using Object Detection

Figure 2 for Cue Point Estimation using Object Detection

Figure 3 for Cue Point Estimation using Object Detection

Figure 4 for Cue Point Estimation using Object Detection

Abstract:Cue points indicate possible temporal boundaries in a transition between two pieces of music in DJ mixing and constitute a crucial element in autonomous DJ systems as well as for live mixing. In this work, we present a novel method for automatic cue point estimation, interpreted as a computer vision object detection task. Our proposed system is based on a pre-trained object detection transformer which we fine-tune on our novel cue point dataset. Our provided dataset contains 21k manually annotated cue points from human experts as well as metronome information for nearly 5k individual tracks, making this dataset 35x larger than the previously available cue point dataset. Unlike previous methods, our approach does not require low-level musical information analysis, while demonstrating increased precision in retrieving cue point positions. Moreover, our proposed method demonstrates high adherence to phrasing, a type of high-level music structure commonly emphasized in electronic dance music. The code, model checkpoints, and dataset are made publicly available.

Via

Access Paper or Ask Questions

PUZZLES: A Benchmark for Neural Algorithmic Reasoning

Jun 29, 2024

Benjamin Estermann, Luca A. Lanzendörfer, Yannick Niedermayr, Roger Wattenhofer

Figure 1 for PUZZLES: A Benchmark for Neural Algorithmic Reasoning

Figure 2 for PUZZLES: A Benchmark for Neural Algorithmic Reasoning

Figure 3 for PUZZLES: A Benchmark for Neural Algorithmic Reasoning

Figure 4 for PUZZLES: A Benchmark for Neural Algorithmic Reasoning

Abstract:Algorithmic reasoning is a fundamental cognitive ability that plays a pivotal role in problem-solving and decision-making processes. Reinforcement Learning (RL) has demonstrated remarkable proficiency in tasks such as motor control, handling perceptual input, and managing stochastic environments. These advancements have been enabled in part by the availability of benchmarks. In this work we introduce PUZZLES, a benchmark based on Simon Tatham's Portable Puzzle Collection, aimed at fostering progress in algorithmic and logical reasoning in RL. PUZZLES contains 40 diverse logic puzzles of adjustable sizes and varying levels of complexity; many puzzles also feature a diverse set of additional configuration parameters. The 40 puzzles provide detailed information on the strengths and generalization capabilities of RL agents. Furthermore, we evaluate various RL algorithms on PUZZLES, providing baseline comparisons and demonstrating the potential for future research. All the software, including the environment, is available at https://github.com/ETH-DISCO/rlp.

Via

Access Paper or Ask Questions

An LLM-based Recommender System Environment

Jun 01, 2024

Nathan Corecco, Giorgio Piatti, Luca A. Lanzendörfer, Flint Xiaofeng Fan, Roger Wattenhofer

Figure 1 for An LLM-based Recommender System Environment

Figure 2 for An LLM-based Recommender System Environment

Figure 3 for An LLM-based Recommender System Environment

Figure 4 for An LLM-based Recommender System Environment

Abstract:Reinforcement learning (RL) has gained popularity in the realm of recommender systems due to its ability to optimize long-term rewards and guide users in discovering relevant content. However, the successful implementation of RL in recommender systems is challenging because of several factors, including the limited availability of online data for training on-policy methods. This scarcity requires expensive human interaction for online model training. Furthermore, the development of effective evaluation frameworks that accurately reflect the quality of models remains a fundamental challenge in recommender systems. To address these challenges, we propose a comprehensive framework for synthetic environments that simulate human behavior by harnessing the capabilities of large language models (LLMs). We complement our framework with in-depth ablation studies and demonstrate its effectiveness with experiments on movie and book recommendations. By utilizing LLMs as synthetic users, this work introduces a modular and novel framework for training RL-based recommender systems. The software, including the RL environment, is publicly available.

Via

Access Paper or Ask Questions