Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Jovita Lukasik

Smooth Model Compression without Fine-Tuning

May 30, 2025

Christina Runkel, Natacha Kuete Meli, Jovita Lukasik, Ander Biguri, Carola-Bibiane Schönlieb, Michael Moeller

Abstract:Compressing and pruning large machine learning models has become a critical step towards their deployment in real-world applications. Standard pruning and compression techniques are typically designed without taking the structure of the network's weights into account, limiting their effectiveness. We explore the impact of smooth regularization on neural network training and model compression. By applying nuclear norm, first- and second-order derivative penalties of the weights during training, we encourage structured smoothness while preserving predictive performance on par with non-smooth models. We find that standard pruning methods often perform better when applied to these smooth models. Building on this observation, we apply a Singular-Value-Decomposition-based compression method that exploits the underlying smooth structure and approximates the model's weight tensors by smaller low-rank tensors. Our approach enables state-of-the-art compression without any fine-tuning - reaching up to $91\%$ accuracy on a smooth ResNet-18 on CIFAR-10 with $70\%$ fewer parameters.

Via

Access Paper or Ask Questions

Transferrable Surrogates in Expressive Neural Architecture Search Spaces

Apr 18, 2025

Shiwen Qin, Gabriela Kadlecová, Martin Pilát, Shay B. Cohen, Roman Neruda, Elliot J. Crowley, Jovita Lukasik, Linus Ericsson

Abstract:Neural architecture search (NAS) faces a challenge in balancing the exploration of expressive, broad search spaces that enable architectural innovation with the need for efficient evaluation of architectures to effectively search such spaces. We investigate surrogate model training for improving search in highly expressive NAS search spaces based on context-free grammars. We show that i) surrogate models trained either using zero-cost-proxy metrics and neural graph features (GRAF) or by fine-tuning an off-the-shelf LM have high predictive power for the performance of architectures both within and across datasets, ii) these surrogates can be used to filter out bad architectures when searching on novel datasets, thereby significantly speeding up search and achieving better final performances, and iii) the surrogates can be further used directly as the search objective for huge speed-ups.

* Project page at: https://shiwenqin.github.io/TransferrableSurrogate/

Via

Access Paper or Ask Questions

Surprisingly Strong Performance Prediction with Neural Graph Features

Apr 25, 2024

Gabriela Kadlecová, Jovita Lukasik, Martin Pilát, Petra Vidnerová, Mahmoud Safari, Roman Neruda, Frank Hutter

Abstract:Performance prediction has been a key part of the neural architecture search (NAS) process, allowing to speed up NAS algorithms by avoiding resource-consuming network training. Although many performance predictors correlate well with ground truth performance, they require training data in the form of trained networks. Recently, zero-cost proxies have been proposed as an efficient method to estimate network performance without any training. However, they are still poorly understood, exhibit biases with network properties, and their performance is limited. Inspired by the drawbacks of zero-cost proxies, we propose neural graph features (GRAF), simple to compute properties of architectural graphs. GRAF offers fast and interpretable performance prediction while outperforming zero-cost proxies and other common encodings. In combination with other zero-cost proxies, GRAF outperforms most existing performance predictors at a fraction of the cost.

* 45 pages, 30 figures

Via

Access Paper or Ask Questions

Are Vision Language Models Texture or Shape Biased and Can We Steer Them?

Mar 14, 2024

Paul Gavrikov, Jovita Lukasik, Steffen Jung, Robert Geirhos, Bianca Lamm, Muhammad Jehanzeb Mirza, Margret Keuper, Janis Keuper

Figure 1 for Are Vision Language Models Texture or Shape Biased and Can We Steer Them?

Figure 2 for Are Vision Language Models Texture or Shape Biased and Can We Steer Them?

Figure 3 for Are Vision Language Models Texture or Shape Biased and Can We Steer Them?

Figure 4 for Are Vision Language Models Texture or Shape Biased and Can We Steer Them?

Abstract:Vision language models (VLMs) have drastically changed the computer vision model landscape in only a few years, opening an exciting array of new applications from zero-shot image classification, over to image captioning, and visual question answering. Unlike pure vision models, they offer an intuitive way to access visual content through language prompting. The wide applicability of such models encourages us to ask whether they also align with human vision - specifically, how far they adopt human-induced visual biases through multimodal fusion, or whether they simply inherit biases from pure vision models. One important visual bias is the texture vs. shape bias, or the dominance of local over global information. In this paper, we study this bias in a wide range of popular VLMs. Interestingly, we find that VLMs are often more shape-biased than their vision encoders, indicating that visual biases are modulated to some extent through text in multimodal models. If text does indeed influence visual biases, this suggests that we may be able to steer visual biases not just through visual input but also through language: a hypothesis that we confirm through extensive experiments. For instance, we are able to steer shape bias from as low as 49% to as high as 72% through prompting alone. For now, the strong human bias towards shape (96%) remains out of reach for all tested VLMs.

Via

Access Paper or Ask Questions

An Evaluation of Zero-Cost Proxies -- from Neural Architecture Performance to Model Robustness

Jul 18, 2023

Jovita Lukasik, Michael Moeller, Margret Keuper

Abstract:Zero-cost proxies are nowadays frequently studied and used to search for neural architectures. They show an impressive ability to predict the performance of architectures by making use of their untrained weights. These techniques allow for immense search speed-ups. So far the joint search for well-performing and robust architectures has received much less attention in the field of NAS. Therefore, the main focus of zero-cost proxies is the clean accuracy of architectures, whereas the model robustness should play an evenly important part. In this paper, we analyze the ability of common zero-cost proxies to serve as performance predictors for robustness in the popular NAS-Bench-201 search space. We are interested in the single prediction task for robustness and the joint multi-objective of clean and robust accuracy. We further analyze the feature importance of the proxies and show that predicting the robustness makes the prediction task from existing zero-cost proxies more challenging. As a result, the joint consideration of several proxies becomes necessary to predict a model's robustness while the clean accuracy can be regressed from a single such feature.

* Accepted at DAGM GCPR 2023

Via

Access Paper or Ask Questions

Neural Architecture Design and Robustness: A Dataset

Jun 11, 2023

Steffen Jung, Jovita Lukasik, Margret Keuper

Abstract:Deep learning models have proven to be successful in a wide range of machine learning tasks. Yet, they are often highly sensitive to perturbations on the input data which can lead to incorrect decisions with high confidence, hampering their deployment for practical use-cases. Thus, finding architectures that are (more) robust against perturbations has received much attention in recent years. Just like the search for well-performing architectures in terms of clean accuracy, this usually involves a tedious trial-and-error process with one additional challenge: the evaluation of a network's robustness is significantly more expensive than its evaluation for clean accuracy. Thus, the aim of this paper is to facilitate better streamlined research on architectural design choices with respect to their impact on robustness as well as, for example, the evaluation of surrogate measures for robustness. We therefore borrow one of the most commonly considered search spaces for neural architecture search for image classification, NAS-Bench-201, which contains a manageable size of 6466 non-isomorphic network designs. We evaluate all these networks on a range of common adversarial attacks and corruption types and introduce a database on neural architecture design and robustness evaluations. We further present three exemplary use cases of this dataset, in which we (i) benchmark robustness measurements based on Jacobian and Hessian matrices for their robustness predictability, (ii) perform neural architecture search on robust accuracies, and (iii) provide an initial analysis of how architectural design choices affect robustness. We find that carefully crafting the topology of a network can have substantial impact on its robustness, where networks with the same parameter count range in mean adversarial robust accuracy from 20%-41%. Code and data is available at http://robustness.vision/.

* ICLR 2023; project page: http://robustness.vision/

Via

Access Paper or Ask Questions

Learning Where To Look -- Generative NAS is Surprisingly Efficient

Mar 16, 2022

Jovita Lukasik, Steffen Jung, Margret Keuper

Figure 1 for Learning Where To Look -- Generative NAS is Surprisingly Efficient

Figure 2 for Learning Where To Look -- Generative NAS is Surprisingly Efficient

Figure 3 for Learning Where To Look -- Generative NAS is Surprisingly Efficient

Figure 4 for Learning Where To Look -- Generative NAS is Surprisingly Efficient

Abstract:The efficient, automated search for well-performing neural architectures (NAS) has drawn increasing attention in the recent past. Thereby, the predominant research objective is to reduce the necessity of costly evaluations of neural architectures while efficiently exploring large search spaces. To this aim, surrogate models embed architectures in a latent space and predict their performance, while generative models for neural architectures enable optimization-based search within the latent space the generator draws from. Both, surrogate and generative models, have the aim of facilitating query-efficient search in a well-structured latent space. In this paper, we further improve the trade-off between query-efficiency and promising architecture generation by leveraging advantages from both, efficient surrogate models and generative design. To this end, we propose a generative model, paired with a surrogate predictor, that iteratively learns to generate samples from increasingly promising latent subspaces. This approach leads to very effective and efficient architecture search, while keeping the query amount low. In addition, our approach allows in a straightforward manner to jointly optimize for multiple objectives such as accuracy and hardware latency. We show the benefit of this approach not only w.r.t. the optimization of architectures for highest classification accuracy but also in the context of hardware constraints and outperform state-of-the art methods on several NAS benchmarks for single and multiple objectives. We also achieve state-of-the-art performance on ImageNet.

Via

Access Paper or Ask Questions

DARTS for Inverse Problems: a Study on Hyperparameter Sensitivity

Aug 12, 2021

Jonas Geiping, Jovita Lukasik, Margret Keuper, Michael Moeller

Figure 1 for DARTS for Inverse Problems: a Study on Hyperparameter Sensitivity

Figure 2 for DARTS for Inverse Problems: a Study on Hyperparameter Sensitivity

Figure 3 for DARTS for Inverse Problems: a Study on Hyperparameter Sensitivity

Figure 4 for DARTS for Inverse Problems: a Study on Hyperparameter Sensitivity

Abstract:Differentiable architecture search (DARTS) is a widely researched tool for neural architecture search, due to its promising results for image classification. The main benefit of DARTS is the effectiveness achieved through the weight-sharing one-shot paradigm, which allows efficient architecture search. In this work, we investigate DARTS in a systematic case study of inverse problems, which allows us to analyze these potential benefits in a controlled manner. Although we demonstrate that the success of DARTS can be extended from image classification to reconstruction, our experiments yield three fundamental difficulties in the evaluation of DARTS-based methods: First, the results show a large variance in all test cases. Second, the final performance is highly dependent on the hyperparameters of the optimizer. And third, the performance of the weight-sharing architecture used during training does not reflect the final performance of the found architecture well. Thus, we conclude the necessity to 1) report the results of any DARTS-based methods from several runs along with its underlying performance statistics, 2) show the correlation of the training and final architecture performance, and 3) carefully consider if the computational efficiency of DARTS outweighs the costs of hyperparameter optimization and multiple runs.

* 11 pages, 5 figures

Via

Access Paper or Ask Questions

Neural Architecture Performance Prediction Using Graph Neural Networks

Oct 19, 2020

Jovita Lukasik, David Friede, Heiner Stuckenschmidt, Margret Keuper

Figure 1 for Neural Architecture Performance Prediction Using Graph Neural Networks

Figure 2 for Neural Architecture Performance Prediction Using Graph Neural Networks

Figure 3 for Neural Architecture Performance Prediction Using Graph Neural Networks

Figure 4 for Neural Architecture Performance Prediction Using Graph Neural Networks

Abstract:In computer vision research, the process of automating architecture engineering, Neural Architecture Search (NAS), has gained substantial interest. Due to the high computational costs, most recent approaches to NAS as well as the few available benchmarks only provide limited search spaces. In this paper we propose a surrogate model for neural architecture performance prediction built upon Graph Neural Networks (GNN). We demonstrate the effectiveness of this surrogate model on neural architecture performance prediction for structurally unknown architectures (i.e. zero shot prediction) by evaluating the GNN on several experiments on the NAS-Bench-101 dataset.

* camera ready version for DAGM GCPR 2020. arXiv admin note: substantial text overlap with arXiv:1912.05317

Via

Access Paper or Ask Questions

Smooth Variational Graph Embeddings for Efficient Neural Architecture Search

Oct 09, 2020

Jovita Lukasik, David Friede, Arber Zela, Heiner Stuckenschmidt, Frank Hutter, Margret Keuper

Figure 1 for Smooth Variational Graph Embeddings for Efficient Neural Architecture Search

Figure 2 for Smooth Variational Graph Embeddings for Efficient Neural Architecture Search

Figure 3 for Smooth Variational Graph Embeddings for Efficient Neural Architecture Search

Figure 4 for Smooth Variational Graph Embeddings for Efficient Neural Architecture Search

Abstract:In this paper, we propose an approach to neural architecture search (NAS) based on graph embeddings. NAS has been addressed previously using discrete, sampling based methods, which are computationally expensive as well as differentiable approaches, which come at lower costs but enforce stronger constraints on the search space. The proposed approach leverages advantages from both sides by building a smooth variational neural architecture embedding space in which we evaluate a structural subset of architectures at training time using the predicted performance while it allows to extrapolate from this subspace at inference time. We evaluate the proposed approach in the context of two common search spaces, the graph structure defined by the ENAS approach and the NAS-Bench-101 search space, and improve over the state of the art in both.

* 13 pages, 4 figures, 6 tables

Via

Access Paper or Ask Questions