Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Robert-Jan Bruintjes

Data-Efficient Challenges in Visual Inductive Priors: A Retrospective

Jun 10, 2025

Robert-Jan Bruintjes, Attila Lengyel, Osman Semih Kayhan, Davide Zambrano, Nergis Tömen, Hadi Jamali-Rad, Jan van Gemert

Abstract:Deep Learning requires large amounts of data to train models that work well. In data-deficient settings, performance can be degraded. We investigate which Deep Learning methods benefit training models in a data-deficient setting, by organizing the "VIPriors: Visual Inductive Priors for Data-Efficient Deep Learning" workshop series, featuring four editions of data-impaired challenges. These challenges address the problem of training deep learning models for computer vision tasks with limited data. Participants are limited to training models from scratch using a low number of training samples and are not allowed to use any form of transfer learning. We aim to stimulate the development of novel approaches that incorporate prior knowledge to improve the data efficiency of deep learning models. Successful challenge entries make use of large model ensembles that mix Transformers and CNNs, as well as heavy data augmentation. Novel prior knowledge-based methods contribute to success in some entries.

Via

Access Paper or Ask Questions

Learning to Adapt to Position Bias in Vision Transformer Classifiers

May 19, 2025

Robert-Jan Bruintjes, Jan van Gemert

Abstract:How discriminative position information is for image classification depends on the data. On the one hand, the camera position is arbitrary and objects can appear anywhere in the image, arguing for translation invariance. At the same time, position information is key for exploiting capture/center bias, and scene layout, e.g.: the sky is up. We show that position bias, the level to which a dataset is more easily solved when positional information on input features is used, plays a crucial role in the performance of Vision Transformers image classifiers. To investigate, we propose Position-SHAP, a direct measure of position bias by extending SHAP to work with position embeddings. We show various levels of position bias in different datasets, and find that the optimal choice of position embedding depends on the position bias apparent in the dataset. We therefore propose Auto-PE, a single-parameter position embedding extension, which allows the position embedding to modulate its norm, enabling the unlearning of position information. Auto-PE combines with existing PEs to match or improve accuracy on classification datasets.

Via

Access Paper or Ask Questions

VIPriors 4: Visual Inductive Priors for Data-Efficient Deep Learning Challenges

Jun 26, 2024

Robert-Jan Bruintjes, Attila Lengyel, Marcos Baptista Rios, Osman Semih Kayhan, Davide Zambrano, Nergis Tomen, Jan van Gemert

Abstract:The fourth edition of the "VIPriors: Visual Inductive Priors for Data-Efficient Deep Learning" workshop features two data-impaired challenges. These challenges address the problem of training deep learning models for computer vision tasks with limited data. Participants are limited to training models from scratch using a low number of training samples and are not allowed to use any form of transfer learning. We aim to stimulate the development of novel approaches that incorporate inductive biases to improve the data efficiency of deep learning models. Significant advancements are made compared to the provided baselines, where winning solutions surpass the baselines by a considerable margin in both tasks. As in previous editions, these achievements are primarily attributed to heavy use of data augmentation policies and large model ensembles, though novel prior-based methods seem to contribute more to successful solutions compared to last year. This report highlights the key aspects of the challenges and their outcomes.

Via

Access Paper or Ask Questions

Color Equivariant Convolutional Networks

Oct 30, 2023

Attila Lengyel, Ombretta Strafforello, Robert-Jan Bruintjes, Alexander Gielisse, Jan van Gemert

Abstract:Color is a crucial visual cue readily exploited by Convolutional Neural Networks (CNNs) for object recognition. However, CNNs struggle if there is data imbalance between color variations introduced by accidental recording conditions. Color invariance addresses this issue but does so at the cost of removing all color information, which sacrifices discriminative power. In this paper, we propose Color Equivariant Convolutions (CEConvs), a novel deep learning building block that enables shape feature sharing across the color spectrum while retaining important color information. We extend the notion of equivariance from geometric to photometric transformations by incorporating parameter sharing over hue-shifts in a neural network. We demonstrate the benefits of CEConvs in terms of downstream performance to various tasks and improved robustness to color changes, including train-test distribution shifts. Our approach can be seamlessly integrated into existing architectures, such as ResNets, and offers a promising solution for addressing color-based domain shifts in CNNs.

* NeurIPS 2023. Code available on https://github.com/Attila94/ceconv

Via

Access Paper or Ask Questions

Benchmarking Data Efficiency and Computational Efficiency of Temporal Action Localization Models

Aug 24, 2023

Jan Warchocki, Teodor Oprescu, Yunhan Wang, Alexandru Damacus, Paul Misterka, Robert-Jan Bruintjes, Attila Lengyel, Ombretta Strafforello, Jan van Gemert

Figure 1 for Benchmarking Data Efficiency and Computational Efficiency of Temporal Action Localization Models

Figure 2 for Benchmarking Data Efficiency and Computational Efficiency of Temporal Action Localization Models

Figure 3 for Benchmarking Data Efficiency and Computational Efficiency of Temporal Action Localization Models

Figure 4 for Benchmarking Data Efficiency and Computational Efficiency of Temporal Action Localization Models

Abstract:In temporal action localization, given an input video, the goal is to predict which actions it contains, where they begin, and where they end. Training and testing current state-of-the-art deep learning models requires access to large amounts of data and computational power. However, gathering such data is challenging and computational resources might be limited. This work explores and measures how current deep temporal action localization models perform in settings constrained by the amount of data or computational power. We measure data efficiency by training each model on a subset of the training set. We find that TemporalMaxer outperforms other models in data-limited settings. Furthermore, we recommend TriDet when training time is limited. To test the efficiency of the models during inference, we pass videos of different lengths through each model. We find that TemporalMaxer requires the least computational resources, likely due to its simple architecture.

* Accepted to the CVEU workshop at ICCV 2023

Via

Access Paper or Ask Questions

VIPriors 3: Visual Inductive Priors for Data-Efficient Deep Learning Challenges

May 31, 2023

Robert-Jan Bruintjes, Attila Lengyel, Marcos Baptista Rios, Osman Semih Kayhan, Davide Zambrano, Nergis Tomen, Jan van Gemert

Abstract:The third edition of the "VIPriors: Visual Inductive Priors for Data-Efficient Deep Learning" workshop featured four data-impaired challenges, focusing on addressing the limitations of data availability in training deep learning models for computer vision tasks. The challenges comprised of four distinct data-impaired tasks, where participants were required to train models from scratch using a reduced number of training samples. The primary objective was to encourage novel approaches that incorporate relevant inductive biases to enhance the data efficiency of deep learning models. To foster creativity and exploration, participants were strictly prohibited from utilizing pre-trained checkpoints and other transfer learning techniques. Significant advancements were made compared to the provided baselines, where winning solutions surpassed the baselines by a considerable margin in all four tasks. These achievements were primarily attributed to the effective utilization of extensive data augmentation policies, model ensembling techniques, and the implementation of data-efficient training methods, including self-supervised representation learning. This report highlights the key aspects of the challenges and their outcomes.

* arXiv admin note: text overlap with arXiv:2201.08625

Via

Access Paper or Ask Questions

What Affects Learned Equivariance in Deep Image Recognition Models?

Apr 07, 2023

Robert-Jan Bruintjes, Tomasz Motyka, Jan van Gemert

Figure 1 for What Affects Learned Equivariance in Deep Image Recognition Models?

Figure 2 for What Affects Learned Equivariance in Deep Image Recognition Models?

Figure 3 for What Affects Learned Equivariance in Deep Image Recognition Models?

Figure 4 for What Affects Learned Equivariance in Deep Image Recognition Models?

Abstract:Equivariance w.r.t. geometric transformations in neural networks improves data efficiency, parameter efficiency and robustness to out-of-domain perspective shifts. When equivariance is not designed into a neural network, the network can still learn equivariant functions from the data. We quantify this learned equivariance, by proposing an improved measure for equivariance. We find evidence for a correlation between learned translation equivariance and validation accuracy on ImageNet. We therefore investigate what can increase the learned equivariance in neural networks, and find that data augmentation, reduced model capacity and inductive bias in the form of convolutions induce higher learned equivariance in neural networks.

* Accepted at CVPR workshop L3D-IVU 2023

Via

Access Paper or Ask Questions

VIPriors 2: Visual Inductive Priors for Data-Efficient Deep Learning Challenges

Jan 21, 2022

Attila Lengyel, Robert-Jan Bruintjes, Marcos Baptista Rios, Osman Semih Kayhan, Davide Zambrano, Nergis Tomen, Jan van Gemert

Figure 1 for VIPriors 2: Visual Inductive Priors for Data-Efficient Deep Learning Challenges

Figure 2 for VIPriors 2: Visual Inductive Priors for Data-Efficient Deep Learning Challenges

Figure 3 for VIPriors 2: Visual Inductive Priors for Data-Efficient Deep Learning Challenges

Figure 4 for VIPriors 2: Visual Inductive Priors for Data-Efficient Deep Learning Challenges

Abstract:The second edition of the "VIPriors: Visual Inductive Priors for Data-Efficient Deep Learning" challenges featured five data-impaired challenges, where models are trained from scratch on a reduced number of training samples for various key computer vision tasks. To encourage new and creative ideas on incorporating relevant inductive biases to improve the data efficiency of deep learning models, we prohibited the use of pre-trained checkpoints and other transfer learning techniques. The provided baselines are outperformed by a large margin in all five challenges, mainly thanks to extensive data augmentation policies, model ensembling, and data efficient network architectures.

* 11 pages, 11 figures

Via

Access Paper or Ask Questions

Domain Adaptation for Rare Classes Augmented with Synthetic Samples

Oct 23, 2021

Tuhin Das, Robert-Jan Bruintjes, Attila Lengyel, Jan van Gemert, Sara Beery

Figure 1 for Domain Adaptation for Rare Classes Augmented with Synthetic Samples

Figure 2 for Domain Adaptation for Rare Classes Augmented with Synthetic Samples

Figure 3 for Domain Adaptation for Rare Classes Augmented with Synthetic Samples

Figure 4 for Domain Adaptation for Rare Classes Augmented with Synthetic Samples

Abstract:To alleviate lower classification performance on rare classes in imbalanced datasets, a possible solution is to augment the underrepresented classes with synthetic samples. Domain adaptation can be incorporated in a classifier to decrease the domain discrepancy between real and synthetic samples. While domain adaptation is generally applied on completely synthetic source domains and real target domains, we explore how domain adaptation can be applied when only a single rare class is augmented with simulated samples. As a testbed, we use a camera trap animal dataset with a rare deer class, which is augmented with synthetic deer samples. We adapt existing domain adaptation methods to two new methods for the single rare class setting: DeerDANN, based on the Domain-Adversarial Neural Network (DANN), and DeerCORAL, based on deep correlation alignment (Deep CORAL) architectures. Experiments show that DeerDANN has the highest improvement in deer classification accuracy of 24.0% versus 22.4% improvement of DeerCORAL when compared to the baseline. Further, both methods require fewer than 10k synthetic samples, as used by the baseline, to achieve these higher accuracies. DeerCORAL requires the least number of synthetic samples (2k deer), followed by DeerDANN (8k deer).

* 14 pages, 6 figures, to be published

Via

Access Paper or Ask Questions

FlexConv: Continuous Kernel Convolutions with Differentiable Kernel Sizes

Oct 18, 2021

David W. Romero, Robert-Jan Bruintjes, Jakub M. Tomczak, Erik J. Bekkers, Mark Hoogendoorn, Jan C. van Gemert

Figure 1 for FlexConv: Continuous Kernel Convolutions with Differentiable Kernel Sizes

Figure 2 for FlexConv: Continuous Kernel Convolutions with Differentiable Kernel Sizes

Figure 3 for FlexConv: Continuous Kernel Convolutions with Differentiable Kernel Sizes

Figure 4 for FlexConv: Continuous Kernel Convolutions with Differentiable Kernel Sizes

Abstract:When designing Convolutional Neural Networks (CNNs), one must select the size of the convolutional kernels before training. Recent works show CNNs benefit from different kernel sizes at different layers, but exploring all possible combinations is unfeasible in practice. A more efficient approach is to learn the kernel size during training. However, existing works that learn the kernel size have a limited bandwidth. These approaches scale kernels by dilation, and thus the detail they can describe is limited. In this work, we propose FlexConv, a novel convolutional operation with which high bandwidth convolutional kernels of learnable kernel size can be learned at a fixed parameter cost. FlexNets model long-term dependencies without the use of pooling, achieve state-of-the-art performance on several sequential datasets, outperform recent works with learned kernel sizes, and are competitive with much deeper ResNets on image benchmark datasets. Additionally, FlexNets can be deployed at higher resolutions than those seen during training. To avoid aliasing, we propose a novel kernel parameterization with which the frequency of the kernels can be analytically controlled. Our novel kernel parameterization shows higher descriptive power and faster convergence speed than existing parameterizations. This leads to important improvements in classification accuracy.

* First two authors contributed equally to this work

Via

Access Paper or Ask Questions