Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Magdalena Proszewska

B-XAIC Dataset: Benchmarking Explainable AI for Graph Neural Networks Using Chemical Data

May 28, 2025

Magdalena Proszewska, Tomasz Danel, Dawid Rymarczyk

Abstract:Understanding the reasoning behind deep learning model predictions is crucial in cheminformatics and drug discovery, where molecular design determines their properties. However, current evaluation frameworks for Explainable AI (XAI) in this domain often rely on artificial datasets or simplified tasks, employing data-derived metrics that fail to capture the complexity of real-world scenarios and lack a direct link to explanation faithfulness. To address this, we introduce B-XAIC, a novel benchmark constructed from real-world molecular data and diverse tasks with known ground-truth rationales for assigned labels. Through a comprehensive evaluation using B-XAIC, we reveal limitations of existing XAI methods for Graph Neural Networks (GNNs) in the molecular domain. This benchmark provides a valuable resource for gaining deeper insights into the faithfulness of XAI, facilitating the development of more reliable and interpretable models.

* 26 pages, 16 figures, 5 tables

Via

Access Paper or Ask Questions

Face Identity-Aware Disentanglement in StyleGAN

Sep 21, 2023

Adrian Suwała, Bartosz Wójcik, Magdalena Proszewska, Jacek Tabor, Przemysław Spurek, Marek Śmieja

Figure 1 for Face Identity-Aware Disentanglement in StyleGAN

Figure 2 for Face Identity-Aware Disentanglement in StyleGAN

Figure 3 for Face Identity-Aware Disentanglement in StyleGAN

Figure 4 for Face Identity-Aware Disentanglement in StyleGAN

Abstract:Conditional GANs are frequently used for manipulating the attributes of face images, such as expression, hairstyle, pose, or age. Even though the state-of-the-art models successfully modify the requested attributes, they simultaneously modify other important characteristics of the image, such as a person's identity. In this paper, we focus on solving this problem by introducing PluGeN4Faces, a plugin to StyleGAN, which explicitly disentangles face attributes from a person's identity. Our key idea is to perform training on images retrieved from movie frames, where a given person appears in various poses and with different attributes. By applying a type of contrastive loss, we encourage the model to group images of the same person in similar regions of latent space. Our experiments demonstrate that the modifications of face attributes performed by PluGeN4Faces are significantly less invasive on the remaining characteristics of the image than in the existing state-of-the-art models.

Via

Access Paper or Ask Questions

Remap, warp and attend: Non-parallel many-to-many accent conversion with Normalizing Flows

Nov 10, 2022

Abdelhamid Ezzerg, Thomas Merritt, Kayoko Yanagisawa, Piotr Bilinski, Magdalena Proszewska, Kamil Pokora, Renard Korzeniowski, Roberto Barra-Chicote, Daniel Korzekwa

Abstract:Regional accents of the same language affect not only how words are pronounced (i.e., phonetic content), but also impact prosodic aspects of speech such as speaking rate and intonation. This paper investigates a novel flow-based approach to accent conversion using normalizing flows. The proposed approach revolves around three steps: remapping the phonetic conditioning, to better match the target accent, warping the duration of the converted speech, to better suit the target phonemes, and an attention mechanism that implicitly aligns source and target speech sequences. The proposed remap-warp-attend system enables adaptation of both phonetic and prosodic aspects of speech while allowing for source and converted speech signals to be of different lengths. Objective and subjective evaluations show that the proposed approach significantly outperforms a competitive CopyCat baseline model in terms of similarity to the target accent, naturalness and intelligibility.

* IEEE Spoken Language Technology Workshop 2022

Via

Access Paper or Ask Questions

GlowVC: Mel-spectrogram space disentangling model for language-independent text-free voice conversion

Jul 04, 2022

Magdalena Proszewska, Grzegorz Beringer, Daniel Sáez-Trigueros, Thomas Merritt, Abdelhamid Ezzerg, Roberto Barra-Chicote

Figure 1 for GlowVC: Mel-spectrogram space disentangling model for language-independent text-free voice conversion

Figure 2 for GlowVC: Mel-spectrogram space disentangling model for language-independent text-free voice conversion

Figure 3 for GlowVC: Mel-spectrogram space disentangling model for language-independent text-free voice conversion

Figure 4 for GlowVC: Mel-spectrogram space disentangling model for language-independent text-free voice conversion

Abstract:In this paper, we propose GlowVC: a multilingual multi-speaker flow-based model for language-independent text-free voice conversion. We build on Glow-TTS, which provides an architecture that enables use of linguistic features during training without the necessity of using them for VC inference. We consider two versions of our model: GlowVC-conditional and GlowVC-explicit. GlowVC-conditional models the distribution of mel-spectrograms with speaker-conditioned flow and disentangles the mel-spectrogram space into content- and pitch-relevant dimensions, while GlowVC-explicit models the explicit distribution with unconditioned flow and disentangles said space into content-, pitch- and speaker-relevant dimensions. We evaluate our models in terms of intelligibility, speaker similarity and naturalness for intra- and cross-lingual conversion in seen and unseen languages. GlowVC models greatly outperform AutoVC baseline in terms of intelligibility, while achieving just as high speaker similarity in intra-lingual VC, and slightly worse in the cross-lingual setting. Moreover, we demonstrate that GlowVC-explicit surpasses both GlowVC-conditional and AutoVC in terms of naturalness.

* Accepted at Interspeech 2022

Via

Access Paper or Ask Questions

Text-free non-parallel many-to-many voice conversion using normalising flows

Mar 15, 2022

Thomas Merritt, Abdelhamid Ezzerg, Piotr Biliński, Magdalena Proszewska, Kamil Pokora, Roberto Barra-Chicote, Daniel Korzekwa

Figure 1 for Text-free non-parallel many-to-many voice conversion using normalising flows

Figure 2 for Text-free non-parallel many-to-many voice conversion using normalising flows

Figure 3 for Text-free non-parallel many-to-many voice conversion using normalising flows

Figure 4 for Text-free non-parallel many-to-many voice conversion using normalising flows

Abstract:Non-parallel voice conversion (VC) is typically achieved using lossy representations of the source speech. However, ensuring only speaker identity information is dropped whilst all other information from the source speech is retained is a large challenge. This is particularly challenging in the scenario where at inference-time we have no knowledge of the text being read, i.e., text-free VC. To mitigate this, we investigate information-preserving VC approaches. Normalising flows have gained attention for text-to-speech synthesis, however have been under-explored for VC. Flows utilize invertible functions to learn the likelihood of the data, thus provide a lossless encoding of speech. We investigate normalising flows for VC in both text-conditioned and text-free scenarios. Furthermore, for text-free VC we compare pre-trained and jointly-learnt priors. Flow-based VC evaluations show no degradation between text-free and text-conditioned VC, resulting in improvements over the state-of-the-art. Also, joint-training of the prior is found to negatively impact text-free VC quality.

Via

Access Paper or Ask Questions

HyperCube: Implicit Field Representations of Voxelized 3D Models

Oct 12, 2021

Magdalena Proszewska, Marcin Mazur, Tomasz Trzciński, Przemysław Spurek

Figure 1 for HyperCube: Implicit Field Representations of Voxelized 3D Models

Figure 2 for HyperCube: Implicit Field Representations of Voxelized 3D Models

Figure 3 for HyperCube: Implicit Field Representations of Voxelized 3D Models

Figure 4 for HyperCube: Implicit Field Representations of Voxelized 3D Models

Abstract:Recently introduced implicit field representations offer an effective way of generating 3D object shapes. They leverage implicit decoder trained to take a 3D point coordinate concatenated with a shape encoding and to output a value which indicates whether the point is outside the shape or not. Although this approach enables efficient rendering of visually plausible objects, it has two significant limitations. First, it is based on a single neural network dedicated for all objects from a training set which results in a cumbersome training procedure and its application in real life. More importantly, the implicit decoder takes only points sampled within voxels (and not the entire voxels) which yields problems at the classification boundaries and results in empty spaces within the rendered mesh. To solve the above limitations, we introduce a new HyperCube architecture based on interval arithmetic network, that enables direct processing of 3D voxels, trained using a hypernetwork paradigm to enforce model convergence. Instead of processing individual 3D samples from within a voxel, our approach allows to input the entire voxel (3D cube) represented with its convex hull coordinates, while the target network constructed by a hypernet assigns it to an inside or outside category. As a result our HyperCube model outperforms the competing approaches both in terms of training and inference efficiency, as well as the final mesh quality.

Via

Access Paper or Ask Questions

PluGeN: Multi-Label Conditional Generation From Pre-Trained Models

Sep 18, 2021

Maciej Wołczyk, Magdalena Proszewska, Łukasz Maziarka, Maciej Zięba, Patryk Wielopolski, Rafał Kurczab, Marek Śmieja

Figure 1 for PluGeN: Multi-Label Conditional Generation From Pre-Trained Models

Figure 2 for PluGeN: Multi-Label Conditional Generation From Pre-Trained Models

Figure 3 for PluGeN: Multi-Label Conditional Generation From Pre-Trained Models

Figure 4 for PluGeN: Multi-Label Conditional Generation From Pre-Trained Models

Abstract:Modern generative models achieve excellent quality in a variety of tasks including image or text generation and chemical molecule modeling. However, existing methods often lack the essential ability to generate examples with requested properties, such as the age of the person in the photo or the weight of the generated molecule. Incorporating such additional conditioning factors would require rebuilding the entire architecture and optimizing the parameters from scratch. Moreover, it is difficult to disentangle selected attributes so that to perform edits of only one attribute while leaving the others unchanged. To overcome these limitations we propose PluGeN (Plugin Generative Network), a simple yet effective generative technique that can be used as a plugin to pre-trained generative models. The idea behind our approach is to transform the entangled latent representation using a flow-based module into a multi-dimensional space where the values of each attribute are modeled as an independent one-dimensional distribution. In consequence, PluGeN can generate new samples with desired attributes as well as manipulate labeled attributes of existing examples. Due to the disentangling of the latent representation, we are even able to generate samples with rare or unseen combinations of attributes in the dataset, such as a young person with gray hair, men with make-up, or women with beards. We combined PluGeN with GAN and VAE models and applied it to conditional generation and manipulation of images and chemical molecule modeling. Experiments demonstrate that PluGeN preserves the quality of backbone models while adding the ability to control the values of labeled attributes.

Via

Access Paper or Ask Questions