Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Soheil Zibakhsh Shabgahi

MergeGuard: Efficient Thwarting of Trojan Attacks in Machine Learning Models

May 06, 2025

Soheil Zibakhsh Shabgahi, Yaman Jandali, Farinaz Koushanfar

Abstract:This paper proposes MergeGuard, a novel methodology for mitigation of AI Trojan attacks. Trojan attacks on AI models cause inputs embedded with triggers to be misclassified to an adversary's target class, posing a significant threat to model usability trained by an untrusted third party. The core of MergeGuard is a new post-training methodology for linearizing and merging fully connected layers which we show simultaneously improves model generalizability and performance. Our Proof of Concept evaluation on Transformer models demonstrates that MergeGuard maintains model accuracy while decreasing trojan attack success rate, outperforming commonly used (post-training) Trojan mitigation by fine-tuning methodologies.

Via

Access Paper or Ask Questions

LayerCollapse: Adaptive compression of neural networks

Nov 29, 2023

Soheil Zibakhsh Shabgahi, Mohammad Soheil Shariff, Farinaz Koushanfar

Figure 1 for LayerCollapse: Adaptive compression of neural networks

Figure 2 for LayerCollapse: Adaptive compression of neural networks

Figure 3 for LayerCollapse: Adaptive compression of neural networks

Figure 4 for LayerCollapse: Adaptive compression of neural networks

Abstract:Handling the ever-increasing scale of contemporary deep learning and transformer-based models poses a significant challenge. Although great strides have been made in optimizing model compression techniques such as model architecture search and knowledge distillation, the availability of data and computational resources remains a considerable hurdle for these optimizations. This paper introduces LayerCollapse, a novel alternative adaptive model compression methodology. LayerCollapse works by eliminating non-linearities within the network and collapsing two consecutive fully connected layers into a single linear transformation. This approach simultaneously reduces both the number of layers and the parameter count, thereby enhancing model efficiency. We also introduce a compression aware regularizer, which compresses the model in alignment with the dataset quality and model expressiveness, consequently reducing overfitting across tasks. Our results demonstrate LayerCollapse's effective compression and regularization capabilities in multiple fine-grained classification benchmarks, achieving up to 74% post training compression with minimal accuracy loss. We compare this method with knowledge distillation on the same target network, showcasing a five-fold increase in computational efficiency and 8% improvement in overall accuracy on the ImageNet dataset.

Via

Access Paper or Ask Questions

LiveTune: Dynamic Parameter Tuning for Training Deep Neural Networks

Nov 28, 2023

Soheil Zibakhsh Shabgahi, Nojan Sheybani, Aiden Tabrizi, Farinaz Koushanfar

Figure 1 for LiveTune: Dynamic Parameter Tuning for Training Deep Neural Networks

Figure 2 for LiveTune: Dynamic Parameter Tuning for Training Deep Neural Networks

Figure 3 for LiveTune: Dynamic Parameter Tuning for Training Deep Neural Networks

Figure 4 for LiveTune: Dynamic Parameter Tuning for Training Deep Neural Networks

Abstract:Traditional machine learning training is a static process that lacks real-time adaptability of hyperparameters. Popular tuning solutions during runtime involve checkpoints and schedulers. Adjusting hyper-parameters usually require the program to be restarted, wasting utilization and time, while placing unnecessary strain on memory and processors. We present LiveTune, a new framework allowing real-time parameter tuning during training through LiveVariables. Live Variables allow for a continuous training session by storing parameters on designated ports on the system, allowing them to be dynamically adjusted. Extensive evaluations of our framework show saving up to 60 seconds and 5.4 Kilojoules of energy per hyperparameter change.

Via

Access Paper or Ask Questions