Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Yani Ioannou

Department of Electrical and Software Engineering, University of Calgary, Calgary, Canada

Sparse Training from Random Initialization: Aligning Lottery Ticket Masks using Weight Symmetry

May 08, 2025

Mohammed Adnan, Rohan Jain, Ekansh Sharma, Rahul Krishnan, Yani Ioannou

Abstract:The Lottery Ticket Hypothesis (LTH) suggests there exists a sparse LTH mask and weights that achieve the same generalization performance as the dense model while using significantly fewer parameters. However, finding a LTH solution is computationally expensive, and a LTH sparsity mask does not generalize to other random weight initializations. Recent work has suggested that neural networks trained from random initialization find solutions within the same basin modulo permutation, and proposes a method to align trained models within the same loss basin. We hypothesize that misalignment of basins is the reason why LTH masks do not generalize to new random initializations and propose permuting the LTH mask to align with the new optimization basin when performing sparse training from a different random init. We empirically show a significant increase in generalization when sparse training from random initialization with the permuted mask as compared to using the non-permuted LTH mask, on multiple datasets (CIFAR-10, CIFAR-100 and ImageNet) and models (VGG11, ResNet20 and ResNet50).

* Accepted at ICML 2025

Via

Access Paper or Ask Questions

Learning Parameter Sharing with Tensor Decompositions and Sparsity

Nov 14, 2024

Cem Üyük, Mike Lasby, Mohamed Yassin, Utku Evci, Yani Ioannou

Figure 1 for Learning Parameter Sharing with Tensor Decompositions and Sparsity

Figure 2 for Learning Parameter Sharing with Tensor Decompositions and Sparsity

Figure 3 for Learning Parameter Sharing with Tensor Decompositions and Sparsity

Figure 4 for Learning Parameter Sharing with Tensor Decompositions and Sparsity

Abstract:Large neural networks achieve remarkable performance, but their size hinders deployment on resource-constrained devices. While various compression techniques exist, parameter sharing remains relatively unexplored. This paper introduces Fine-grained Parameter Sharing (FiPS), a novel algorithm that leverages the relationship between parameter sharing, tensor decomposition, and sparsity to efficiently compress large vision transformer models. FiPS employs a shared base and sparse factors to represent shared neurons across multi-layer perception (MLP) modules. Shared parameterization is initialized via Singular Value Decomposition (SVD) and optimized by minimizing block-wise reconstruction error. Experiments demonstrate that FiPS compresses DeiT-B and Swin-L MLPs to 25-40% of their original parameter count while maintaining accuracy within 1 percentage point of the original models.

Via

Access Paper or Ask Questions

Navigating Extremes: Dynamic Sparsity in Large Output Space

Nov 05, 2024

Nasib Ullah, Erik Schultheis, Mike Lasby, Yani Ioannou, Rohit Babbar

Abstract:In recent years, Dynamic Sparse Training (DST) has emerged as an alternative to post-training pruning for generating efficient models. In principle, DST allows for a more memory efficient training process, as it maintains sparsity throughout the entire training run. However, current DST implementations fail to capitalize on this in practice. Because sparse matrix multiplication is much less efficient than dense matrix multiplication on GPUs, most implementations simulate sparsity by masking weights. In this paper, we leverage recent advances in semi-structured sparse training to apply DST in the domain of classification with large output spaces, where memory-efficiency is paramount. With a label space of possibly millions of candidates, the classification layer alone will consume several gigabytes of memory. Switching from a dense to a fixed fan-in sparse layer updated with sparse evolutionary training (SET); however, severely hampers training convergence, especially at the largest label spaces. We find that poor gradient flow from the sparse classifier to the dense text encoder make it difficult to learn good input representations. By employing an intermediate layer or adding an auxiliary training objective, we recover most of the generalisation performance of the dense model. Overall, we demonstrate the applicability and practical benefits of DST in a challenging domain -- characterized by a highly skewed label distribution that differs substantially from typical DST benchmark datasets -- which enables end-to-end training with millions of labels on commodity hardware.

* 20 pages, 7 figures, NeurIPS 2024

Via

Access Paper or Ask Questions

What is Left After Distillation? How Knowledge Transfer Impacts Fairness and Bias

Oct 10, 2024

Aida Mohammadshahi, Yani Ioannou

Abstract:Knowledge Distillation is a commonly used Deep Neural Network compression method, which often maintains overall generalization performance. However, we show that even for balanced image classification datasets, such as CIFAR-100, Tiny ImageNet and ImageNet, as many as 41% of the classes are statistically significantly affected by distillation when comparing class-wise accuracy (i.e. class bias) between a teacher/distilled student or distilled student/non-distilled student model. Changes in class bias are not necessarily an undesirable outcome when considered outside of the context of a model's usage. Using two common fairness metrics, Demographic Parity Difference (DPD) and Equalized Odds Difference (EOD) on models trained with the CelebA, Trifeature, and HateXplain datasets, our results suggest that increasing the distillation temperature improves the distilled student model's fairness -- for DPD, the distilled student even surpasses the fairness of the teacher model at high temperatures. This study highlights the uneven effects of Knowledge Distillation on certain classes and its potentially significant role in fairness, emphasizing that caution is warranted when using distilled models for sensitive application domains.

Via

Access Paper or Ask Questions

Trustworthy and Responsible AI for Human-Centric Autonomous Decision-Making Systems

Sep 02, 2024

Farzaneh Dehghani, Mahsa Dibaji, Fahim Anzum, Lily Dey, Alican Basdemir, Sayeh Bayat, Jean-Christophe Boucher, Steve Drew, Sarah Elaine Eaton, Richard Frayne(+16 more)

Abstract:Artificial Intelligence (AI) has paved the way for revolutionary decision-making processes, which if harnessed appropriately, can contribute to advancements in various sectors, from healthcare to economics. However, its black box nature presents significant ethical challenges related to bias and transparency. AI applications are hugely impacted by biases, presenting inconsistent and unreliable findings, leading to significant costs and consequences, highlighting and perpetuating inequalities and unequal access to resources. Hence, developing safe, reliable, ethical, and Trustworthy AI systems is essential. Our team of researchers working with Trustworthy and Responsible AI, part of the Transdisciplinary Scholarship Initiative within the University of Calgary, conducts research on Trustworthy and Responsible AI, including fairness, bias mitigation, reproducibility, generalization, interpretability, and authenticity. In this paper, we review and discuss the intricacies of AI biases, definitions, methods of detection and mitigation, and metrics for evaluating bias. We also discuss open challenges with regard to the trustworthiness and widespread application of AI across diverse domains of human-centric decision making, as well as guidelines to foster Responsible and Trustworthy AI models.

* 44 pages, 2 figures

Via

Access Paper or Ask Questions

Meta-GCN: A Dynamically Weighted Loss Minimization Method for Dealing with the Data Imbalance in Graph Neural Networks

Jun 24, 2024

Mahdi Mohammadizadeh, Arash Mozhdehi, Yani Ioannou, Xin Wang

Abstract:Although many real-world applications, such as disease prediction, and fault detection suffer from class imbalance, most existing graph-based classification methods ignore the skewness of the distribution of classes; therefore, tend to be biased towards the majority class(es). Conventional methods typically tackle this problem through the assignment of weights to each one of the class samples based on a function of their loss, which can lead to over-fitting on outliers. In this paper, we propose a meta-learning algorithm, named Meta-GCN, for adaptively learning the example weights by simultaneously minimizing the unbiased meta-data set loss and optimizing the model weights through the use of a small unbiased meta-data set. Through experiments, we have shown that Meta-GCN outperforms state-of-the-art frameworks and other baselines in terms of accuracy, the area under the receiver operating characteristic (AUC-ROC) curve, and macro F1-Score for classification tasks on two different datasets.

Via

Access Paper or Ask Questions

Dynamic Sparse Training with Structured Sparsity

May 03, 2023

Mike Lasby, Anna Golubeva, Utku Evci, Mihai Nica, Yani Ioannou

Figure 1 for Dynamic Sparse Training with Structured Sparsity

Figure 2 for Dynamic Sparse Training with Structured Sparsity

Figure 3 for Dynamic Sparse Training with Structured Sparsity

Figure 4 for Dynamic Sparse Training with Structured Sparsity

Abstract:DST methods achieve state-of-the-art results in sparse neural network training, matching the generalization of dense models while enabling sparse training and inference. Although the resulting models are highly sparse and theoretically cheaper to train, achieving speedups with unstructured sparsity on real-world hardware is challenging. In this work we propose a DST method to learn a variant of structured N:M sparsity, the acceleration of which in general is commonly supported in commodity hardware. Furthermore, we motivate with both a theoretical analysis and empirical results, the generalization performance of our specific N:M sparsity (constant fan-in), present a condensed representation with a reduced parameter and memory footprint, and demonstrate reduced inference time compared to dense models with a naive PyTorch CPU implementation of the condensed representation Our source code is available at https://github.com/calgaryml/condensed-sparsity

* 16 pages, 11 figures

Via

Access Paper or Ask Questions

Bounding generalization error with input compression: An empirical study with infinite-width networks

Jul 19, 2022

Angus Galloway, Anna Golubeva, Mahmoud Salem, Mihai Nica, Yani Ioannou, Graham W. Taylor

Figure 1 for Bounding generalization error with input compression: An empirical study with infinite-width networks

Figure 2 for Bounding generalization error with input compression: An empirical study with infinite-width networks

Figure 3 for Bounding generalization error with input compression: An empirical study with infinite-width networks

Figure 4 for Bounding generalization error with input compression: An empirical study with infinite-width networks

Abstract:Estimating the Generalization Error (GE) of Deep Neural Networks (DNNs) is an important task that often relies on availability of held-out data. The ability to better predict GE based on a single training set may yield overarching DNN design principles to reduce a reliance on trial-and-error, along with other performance assessment advantages. In search of a quantity relevant to GE, we investigate the Mutual Information (MI) between the input and final layer representations, using the infinite-width DNN limit to bound MI. An existing input compression-based GE bound is used to link MI and GE. To the best of our knowledge, this represents the first empirical study of this bound. In our attempt to empirically falsify the theoretical bound, we find that it is often tight for best-performing models. Furthermore, it detects randomization of training labels in many cases, reflects test-time perturbation robustness, and works well given only few training samples. These results are promising given that input compression is broadly applicable where MI can be estimated with confidence.

* 12 pages main content, 26 pages total

Via

Access Paper or Ask Questions

Monitoring Shortcut Learning using Mutual Information

Jun 27, 2022

Mohammed Adnan, Yani Ioannou, Chuan-Yung Tsai, Angus Galloway, H. R. Tizhoosh, Graham W. Taylor

Figure 1 for Monitoring Shortcut Learning using Mutual Information

Figure 2 for Monitoring Shortcut Learning using Mutual Information

Figure 3 for Monitoring Shortcut Learning using Mutual Information

Figure 4 for Monitoring Shortcut Learning using Mutual Information

Abstract:The failure of deep neural networks to generalize to out-of-distribution data is a well-known problem and raises concerns about the deployment of trained networks in safety-critical domains such as healthcare, finance and autonomous vehicles. We study a particular kind of distribution shift $\unicode{x2013}$ shortcuts or spurious correlations in the training data. Shortcut learning is often only exposed when models are evaluated on real-world data that does not contain the same spurious correlations, posing a serious dilemma for AI practitioners to properly assess the effectiveness of a trained model for real-world applications. In this work, we propose to use the mutual information (MI) between the learned representation and the input as a metric to find where in training, the network latches onto shortcuts. Experiments demonstrate that MI can be used as a domain-agnostic metric for monitoring shortcut learning.

* Accepted at ICML 2022 Workshop on Spurious Correlations, Invariance, and Stability

Via

Access Paper or Ask Questions

Measuring Neural Net Robustness with Constraints

Jun 16, 2017

Osbert Bastani, Yani Ioannou, Leonidas Lampropoulos, Dimitrios Vytiniotis, Aditya Nori, Antonio Criminisi

Figure 1 for Measuring Neural Net Robustness with Constraints

Figure 2 for Measuring Neural Net Robustness with Constraints

Figure 3 for Measuring Neural Net Robustness with Constraints

Figure 4 for Measuring Neural Net Robustness with Constraints

Abstract:Despite having high accuracy, neural nets have been shown to be susceptible to adversarial examples, where a small perturbation to an input can cause it to become mislabeled. We propose metrics for measuring the robustness of a neural net and devise a novel algorithm for approximating these metrics based on an encoding of robustness as a linear program. We show how our metrics can be used to evaluate the robustness of deep neural nets with experiments on the MNIST and CIFAR-10 datasets. Our algorithm generates more informative estimates of robustness metrics compared to estimates based on existing algorithms. Furthermore, we show how existing approaches to improving robustness "overfit" to adversarial examples generated using a specific algorithm. Finally, we show that our techniques can be used to additionally improve neural net robustness both according to the metrics that we propose, but also according to previously proposed metrics.

Via

Access Paper or Ask Questions