Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Linus Ericsson

Transferrable Surrogates in Expressive Neural Architecture Search Spaces

Apr 18, 2025

Shiwen Qin, Gabriela Kadlecová, Martin Pilát, Shay B. Cohen, Roman Neruda, Elliot J. Crowley, Jovita Lukasik, Linus Ericsson

Abstract:Neural architecture search (NAS) faces a challenge in balancing the exploration of expressive, broad search spaces that enable architectural innovation with the need for efficient evaluation of architectures to effectively search such spaces. We investigate surrogate model training for improving search in highly expressive NAS search spaces based on context-free grammars. We show that i) surrogate models trained either using zero-cost-proxy metrics and neural graph features (GRAF) or by fine-tuning an off-the-shelf LM have high predictive power for the performance of architectures both within and across datasets, ii) these surrogates can be used to filter out bad architectures when searching on novel datasets, thereby significantly speeding up search and achieving better final performances, and iii) the surrogates can be further used directly as the search objective for huge speed-ups.

* Project page at: https://shiwenqin.github.io/TransferrableSurrogate/

Via

Access Paper or Ask Questions

There is no SAMantics! Exploring SAM as a Backbone for Visual Understanding Tasks

Nov 22, 2024

Miguel Espinosa, Chenhongyi Yang, Linus Ericsson, Steven McDonagh, Elliot J. Crowley

Abstract:The Segment Anything Model (SAM) was originally designed for label-agnostic mask generation. Does this model also possess inherent semantic understanding, of value to broader visual tasks? In this work we follow a multi-staged approach towards exploring this question. We firstly quantify SAM's semantic capabilities by comparing base image encoder efficacy under classification tasks, in comparison with established models (CLIP and DINOv2). Our findings reveal a significant lack of semantic discriminability in SAM feature representations, limiting potential for tasks that require class differentiation. This initial result motivates our exploratory study that attempts to enable semantic information via in-context learning with lightweight fine-tuning where we observe that generalisability to unseen classes remains limited. Our observations culminate in the proposal of a training-free approach that leverages DINOv2 features, towards better endowing SAM with semantic understanding and achieving instance-level class differentiation through feature-based similarity. Our study suggests that incorporation of external semantic sources provides a promising direction for the enhancement of SAM's utility with respect to complex visual tasks that require semantic understanding.

* Preprint. Work in progress

Via

Access Paper or Ask Questions

einspace: Searching for Neural Architectures from Fundamental Operations

May 31, 2024

Linus Ericsson, Miguel Espinosa, Chenhongyi Yang, Antreas Antoniou, Amos Storkey, Shay B. Cohen, Steven McDonagh, Elliot J. Crowley

Figure 1 for einspace: Searching for Neural Architectures from Fundamental Operations

Figure 2 for einspace: Searching for Neural Architectures from Fundamental Operations

Figure 3 for einspace: Searching for Neural Architectures from Fundamental Operations

Figure 4 for einspace: Searching for Neural Architectures from Fundamental Operations

Abstract:Neural architecture search (NAS) finds high performing networks for a given task. Yet the results of NAS are fairly prosaic; they did not e.g. create a shift from convolutional structures to transformers. This is not least because the search spaces in NAS often aren't diverse enough to include such transformations a priori. Instead, for NAS to provide greater potential for fundamental design shifts, we need a novel expressive search space design which is built from more fundamental operations. To this end, we introduce einspace, a search space based on a parameterised probabilistic context-free grammar. Our space is versatile, supporting architectures of various sizes and complexities, while also containing diverse network operations which allow it to model convolutions, attention components and more. It contains many existing competitive architectures, and provides flexibility for discovering new ones. Using this search space, we perform experiments to find novel architectures as well as improvements on existing ones on the diverse Unseen NAS datasets. We show that competitive architectures can be obtained by searching from scratch, and we consistently find large improvements when initialising the search with strong baselines. We believe that this work is an important advancement towards a transformative NAS paradigm where search space expressivity and strategic search initialisation play key roles.

* Project page at https://linusericsson.github.io/einspace/

Via

Access Paper or Ask Questions

Hyperparameter Selection in Continual Learning

Apr 09, 2024

Thomas L. Lee, Sigrid Passano Hellan, Linus Ericsson, Elliot J. Crowley, Amos Storkey

Figure 1 for Hyperparameter Selection in Continual Learning

Figure 2 for Hyperparameter Selection in Continual Learning

Figure 3 for Hyperparameter Selection in Continual Learning

Figure 4 for Hyperparameter Selection in Continual Learning

Abstract:In continual learning (CL) -- where a learner trains on a stream of data -- standard hyperparameter optimisation (HPO) cannot be applied, as a learner does not have access to all of the data at the same time. This has prompted the development of CL-specific HPO frameworks. The most popular way to tune hyperparameters in CL is to repeatedly train over the whole data stream with different hyperparameter settings. However, this end-of-training HPO is unrealistic as in practice a learner can only see the stream once. Hence, there is an open question: what HPO framework should a practitioner use for a CL problem in reality? This paper answers this question by evaluating several realistic HPO frameworks. We find that all the HPO frameworks considered, including end-of-training HPO, perform similarly. We therefore advocate using the realistic and most computationally efficient method: fitting the hyperparameters on the first task and then fixing them throughout training.

* Preprint, 9 pages

Via

Access Paper or Ask Questions

PlainMamba: Improving Non-Hierarchical Mamba in Visual Recognition

Mar 26, 2024

Chenhongyi Yang, Zehui Chen, Miguel Espinosa, Linus Ericsson, Zhenyu Wang, Jiaming Liu, Elliot J. Crowley

Figure 1 for PlainMamba: Improving Non-Hierarchical Mamba in Visual Recognition

Figure 2 for PlainMamba: Improving Non-Hierarchical Mamba in Visual Recognition

Figure 3 for PlainMamba: Improving Non-Hierarchical Mamba in Visual Recognition

Figure 4 for PlainMamba: Improving Non-Hierarchical Mamba in Visual Recognition

Abstract:We present PlainMamba: a simple non-hierarchical state space model (SSM) designed for general visual recognition. The recent Mamba model has shown how SSMs can be highly competitive with other architectures on sequential data and initial attempts have been made to apply it to images. In this paper, we further adapt the selective scanning process of Mamba to the visual domain, enhancing its ability to learn features from two-dimensional images by (i) a continuous 2D scanning process that improves spatial continuity by ensuring adjacency of tokens in the scanning sequence, and (ii) direction-aware updating which enables the model to discern the spatial relations of tokens by encoding directional information. Our architecture is designed to be easy to use and easy to scale, formed by stacking identical PlainMamba blocks, resulting in a model with constant width throughout all layers. The architecture is further simplified by removing the need for special tokens. We evaluate PlainMamba on a variety of visual recognition tasks including image classification, semantic segmentation, object detection, and instance segmentation. Our method achieves performance gains over previous non-hierarchical models and is competitive with hierarchical alternatives. For tasks requiring high-resolution inputs, in particular, PlainMamba requires much less computing while maintaining high performance. Code and models are available at https://github.com/ChenhongyiYang/PlainMamba

Via

Access Paper or Ask Questions

Self-Supervised Disentanglement by Leveraging Structure in Data Augmentations

Nov 15, 2023

Cian Eastwood, Julius von Kügelgen, Linus Ericsson, Diane Bouchacourt, Pascal Vincent, Bernhard Schölkopf, Mark Ibrahim

Figure 1 for Self-Supervised Disentanglement by Leveraging Structure in Data Augmentations

Figure 2 for Self-Supervised Disentanglement by Leveraging Structure in Data Augmentations

Figure 3 for Self-Supervised Disentanglement by Leveraging Structure in Data Augmentations

Figure 4 for Self-Supervised Disentanglement by Leveraging Structure in Data Augmentations

Abstract:Self-supervised representation learning often uses data augmentations to induce some invariance to "style" attributes of the data. However, with downstream tasks generally unknown at training time, it is difficult to deduce a priori which attributes of the data are indeed "style" and can be safely discarded. To address this, we introduce a more principled approach that seeks to disentangle style features rather than discard them. The key idea is to add multiple style embedding spaces where: (i) each is invariant to all-but-one augmentation; and (ii) joint entropy is maximized. We formalize our structured data-augmentation procedure from a causal latent-variable-model perspective, and prove identifiability of both content and (multiple blocks of) style variables. We empirically demonstrate the benefits of our approach on synthetic datasets and then present promising but limited results on ImageNet.

Via

Access Paper or Ask Questions

Better Practices for Domain Adaptation

Sep 07, 2023

Linus Ericsson, Da Li, Timothy M. Hospedales

Abstract:Distribution shifts are all too common in real-world applications of machine learning. Domain adaptation (DA) aims to address this by providing various frameworks for adapting models to the deployment data without using labels. However, the domain shift scenario raises a second more subtle challenge: the difficulty of performing hyperparameter optimisation (HPO) for these adaptation algorithms without access to a labelled validation set. The unclear validation protocol for DA has led to bad practices in the literature, such as performing HPO using the target test labels when, in real-world scenarios, they are not available. This has resulted in over-optimism about DA research progress compared to reality. In this paper, we analyse the state of DA when using good evaluation practice, by benchmarking a suite of candidate validation criteria and using them to assess popular adaptation algorithms. We show that there are challenges across all three branches of domain adaptation methodology including Unsupervised Domain Adaptation (UDA), Source-Free Domain Adaptation (SFDA), and Test Time Adaptation (TTA). While the results show that realistically achievable performance is often worse than expected, they also show that using proper validation splits is beneficial, as well as showing that some previously unexplored validation metrics provide the best options to date. Altogether, our improved practices covering data, training, validation and hyperparameter optimisation form a new rigorous pipeline to improve benchmarking, and hence research progress, within this important field going forward.

* AutoML 2023 (Best paper award)

Via

Access Paper or Ask Questions

Parameter-Efficient Fine-Tuning for Medical Image Analysis: The Missed Opportunity

May 24, 2023

Raman Dutt, Linus Ericsson, Pedro Sanchez, Sotirios A. Tsaftaris, Timothy Hospedales

Figure 1 for Parameter-Efficient Fine-Tuning for Medical Image Analysis: The Missed Opportunity

Figure 2 for Parameter-Efficient Fine-Tuning for Medical Image Analysis: The Missed Opportunity

Figure 3 for Parameter-Efficient Fine-Tuning for Medical Image Analysis: The Missed Opportunity

Figure 4 for Parameter-Efficient Fine-Tuning for Medical Image Analysis: The Missed Opportunity

Abstract:We present a comprehensive evaluation of Parameter-Efficient Fine-Tuning (PEFT) techniques for diverse medical image analysis tasks. PEFT is increasingly exploited as a valuable approach for knowledge transfer from pre-trained models in natural language processing, vision, speech, and cross-modal tasks, such as vision-language and text-to-image generation. However, its application in medical image analysis remains relatively unexplored. As foundation models are increasingly exploited in the medical domain, it is crucial to investigate and comparatively assess various strategies for knowledge transfer that can bolster a range of downstream tasks. Our study, the first of its kind (to the best of our knowledge), evaluates 16 distinct PEFT methodologies proposed for convolutional and transformer-based networks, focusing on image classification and text-to-image generation tasks across six medical datasets ranging in size, modality, and complexity. Through a battery of more than 600 controlled experiments, we demonstrate performance gains of up to 22% under certain scenarios and demonstrate the efficacy of PEFT for medical text-to-image generation. Further, we reveal the instances where PEFT methods particularly dominate over conventional fine-tuning approaches by studying their relationship with downstream data volume.

Via

Access Paper or Ask Questions

Region Proposal Network Pre-Training Helps Label-Efficient Object Detection

Nov 16, 2022

Linus Ericsson, Nanqing Dong, Yongxin Yang, Ales Leonardis, Steven McDonagh

Abstract:Self-supervised pre-training, based on the pretext task of instance discrimination, has fueled the recent advance in label-efficient object detection. However, existing studies focus on pre-training only a feature extractor network to learn transferable representations for downstream detection tasks. This leads to the necessity of training multiple detection-specific modules from scratch in the fine-tuning phase. We argue that the region proposal network (RPN), a common detection-specific module, can additionally be pre-trained towards reducing the localization error of multi-stage detectors. In this work, we propose a simple pretext task that provides an effective pre-training for the RPN, towards efficiently improving downstream object detection performance. We evaluate the efficacy of our approach on benchmark object detection tasks and additional downstream tasks, including instance segmentation and few-shot detection. In comparison with multi-stage detectors without RPN pre-training, our approach is able to consistently improve downstream task performance, with largest gains found in label-scarce settings.

* Presented at NeurIPS 2022 Workshop: Self-Supervised Learning - Theory and Practice

Via

Access Paper or Ask Questions

Why Do Self-Supervised Models Transfer? Investigating the Impact of Invariance on Downstream Tasks

Nov 22, 2021

Linus Ericsson, Henry Gouk, Timothy M. Hospedales

Figure 1 for Why Do Self-Supervised Models Transfer? Investigating the Impact of Invariance on Downstream Tasks

Figure 2 for Why Do Self-Supervised Models Transfer? Investigating the Impact of Invariance on Downstream Tasks

Figure 3 for Why Do Self-Supervised Models Transfer? Investigating the Impact of Invariance on Downstream Tasks

Figure 4 for Why Do Self-Supervised Models Transfer? Investigating the Impact of Invariance on Downstream Tasks

Abstract:Self-supervised learning is a powerful paradigm for representation learning on unlabelled images. A wealth of effective new methods based on instance matching rely on data augmentation to drive learning, and these have reached a rough agreement on an augmentation scheme that optimises popular recognition benchmarks. However, there is strong reason to suspect that different tasks in computer vision require features to encode different (in)variances, and therefore likely require different augmentation strategies. In this paper, we measure the invariances learned by contrastive methods and confirm that they do learn invariance to the augmentations used and further show that this invariance largely transfers to related real-world changes in pose and lighting. We show that learned invariances strongly affect downstream task performance and confirm that different downstream tasks benefit from polar opposite (in)variances, leading to performance loss when the standard augmentation strategy is used. Finally, we demonstrate that a simple fusion of representations with complementary invariances ensures wide transferability to all the diverse downstream tasks considered.

* Code available at https://github.com/linusericsson/ssl-invariances

Via

Access Paper or Ask Questions