Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Adrián Csiszárik

Mode Combinability: Exploring Convex Combinations of Permutation Aligned Models

Aug 22, 2023

Adrián Csiszárik, Melinda F. Kiss, Péter Kőrösi-Szabó, Márton Muntag, Gergely Papp, Dániel Varga

Abstract:We explore element-wise convex combinations of two permutation-aligned neural network parameter vectors $\Theta_A$ and $\Theta_B$ of size $d$. We conduct extensive experiments by examining various distributions of such model combinations parametrized by elements of the hypercube $[0,1]^{d}$ and its vicinity. Our findings reveal that broad regions of the hypercube form surfaces of low loss values, indicating that the notion of linear mode connectivity extends to a more general phenomenon which we call mode combinability. We also make several novel observations regarding linear mode connectivity and model re-basin. We demonstrate a transitivity property: two models re-based to a common third model are also linear mode connected, and a robustness property: even with significant perturbations of the neuron matchings the resulting combinations continue to form a working model. Moreover, we analyze the functional and weight similarity of model combinations and show that such combinations are non-vacuous in the sense that there are significant functional differences between the resulting models.

Via

Access Paper or Ask Questions

Similarity and Matching of Neural Network Representations

Oct 27, 2021

Adrián Csiszárik, Péter Kőrösi-Szabó, Ákos K. Matszangosz, Gergely Papp, Dániel Varga

Figure 1 for Similarity and Matching of Neural Network Representations

Figure 2 for Similarity and Matching of Neural Network Representations

Figure 3 for Similarity and Matching of Neural Network Representations

Figure 4 for Similarity and Matching of Neural Network Representations

Abstract:We employ a toolset -- dubbed Dr. Frankenstein -- to analyse the similarity of representations in deep neural networks. With this toolset, we aim to match the activations on given layers of two trained neural networks by joining them with a stitching layer. We demonstrate that the inner representations emerging in deep convolutional neural networks with the same architecture but different initializations can be matched with a surprisingly high degree of accuracy even with a single, affine stitching layer. We choose the stitching layer from several possible classes of linear transformations and investigate their performance and properties. The task of matching representations is closely related to notions of similarity. Using this toolset, we also provide a novel viewpoint on the current line of research regarding similarity indices of neural network representations: the perspective of the performance on a task.

* To appear in the Thirty-fifth Conference on Neural Information Processing Systems (NeurIPS 2021)

Via

Access Paper or Ask Questions

Visualizing Transfer Learning

Jul 15, 2020

Róbert Szabó, Dániel Katona, Márton Csillag, Adrián Csiszárik, Dániel Varga

Figure 1 for Visualizing Transfer Learning

Figure 2 for Visualizing Transfer Learning

Figure 3 for Visualizing Transfer Learning

Figure 4 for Visualizing Transfer Learning

Abstract:We provide visualizations of individual neurons of a deep image recognition network during the temporal process of transfer learning. These visualizations qualitatively demonstrate various novel properties of the transfer learning process regarding the speed and characteristics of adaptation, neuron reuse, spatial scale of the represented image features, and behavior of transfer learning to small data. We publish the large-scale dataset that we have created for the purposes of this analysis.

* 2020 ICML Workshop on Human Interpretability in Machine Learning (WHI 2020)

Via

Access Paper or Ask Questions

Negative Sampling in Variational Autoencoders

Oct 07, 2019

Adrián Csiszárik, Beatrix Benkő, Dániel Varga

Figure 1 for Negative Sampling in Variational Autoencoders

Figure 2 for Negative Sampling in Variational Autoencoders

Figure 3 for Negative Sampling in Variational Autoencoders

Figure 4 for Negative Sampling in Variational Autoencoders

Abstract:We propose negative sampling as an approach to improve the notoriously bad out-of-distribution likelihood estimates of Variational Autoencoder models. Our model pushes latent images of negative samples away from the prior. When the source of negative samples is an auxiliary dataset, such a model can vastly improve on baselines when evaluated on OOD detection tasks. Perhaps more surprisingly, we present a fully unsupervised variant that can also significantly improve detection performance: using the output of the generator as negative samples results in a fully unsupervised model that can be interpreted as adversarially trained.

Via

Access Paper or Ask Questions

Towards Finding Longer Proofs

May 30, 2019

Zsolt Zombori, Adrián Csiszárik, Henryk Michalewski, Cezary Kaliszyk, Josef Urban

Figure 1 for Towards Finding Longer Proofs

Figure 2 for Towards Finding Longer Proofs

Figure 3 for Towards Finding Longer Proofs

Figure 4 for Towards Finding Longer Proofs

Abstract:We present a reinforcement learning (RL) based guidance system for automated theorem proving geared towards Finding Longer Proofs (FLoP). FLoP focuses on generalizing from short proofs to longer ones of similar structure. To achieve that, FLoP uses state-of-the-art RL approaches that were previously not applied in theorem proving. In particular, we show that curriculum learning significantly outperforms previous learning-based proof guidance on a synthetic dataset of increasingly difficult arithmetic problems.

* 9 pages, 5 figures

Via

Access Paper or Ask Questions

Gradient Regularization Improves Accuracy of Discriminative Models

May 24, 2018

Dániel Varga, Adrián Csiszárik, Zsolt Zombori

Figure 1 for Gradient Regularization Improves Accuracy of Discriminative Models

Figure 2 for Gradient Regularization Improves Accuracy of Discriminative Models

Figure 3 for Gradient Regularization Improves Accuracy of Discriminative Models

Figure 4 for Gradient Regularization Improves Accuracy of Discriminative Models

Abstract:Regularizing the gradient norm of the output of a neural network with respect to its inputs is a powerful technique, rediscovered several times. This paper presents evidence that gradient regularization can consistently improve classification accuracy on vision tasks, using modern deep neural networks, especially when the amount of training data is small. We introduce our regularizers as members of a broader class of Jacobian-based regularizers. We demonstrate empirically on real and synthetic data that the learning process leads to gradients controlled beyond the training points, and results in solutions that generalize well.

Via

Access Paper or Ask Questions