Abstract:The widespread popularity of equivariant networks underscores the significance of parameter efficient models and effective use of training data. At a time when robustness to unseen deformations is becoming increasingly important, we present H-NeXt, which bridges the gap between equivariance and invariance. H-NeXt is a parameter-efficient roto-translation invariant network that is trained without a single augmented image in the training set. Our network comprises three components: an equivariant backbone for learning roto-translation independent features, an invariant pooling layer for discarding roto-translation information, and a classification layer. H-NeXt outperforms the state of the art in classification on unaugmented training sets and augmented test sets of MNIST and CIFAR-10.
Abstract:This paper proposes a novel approach to real-time automatic rim detection, classification, and inspection by combining traditional computer vision and deep learning techniques. At the end of every automotive assembly line, a quality control process is carried out to identify any potential defects in the produced cars. Common yet hazardous defects are related, for example, to incorrectly mounted rims. Routine inspections are mostly conducted by human workers that are negatively affected by factors such as fatigue or distraction. We have designed a new prototype to validate whether all four wheels on a single car match in size and type. Additionally, we present three comprehensive open-source databases, CWD1500, WHEEL22, and RB600, for wheel, rim, and bolt detection, as well as rim classification, which are free-to-use for scientific purposes.
Abstract:We introduce NeRD, a new demosaicking method for generating full-color images from Bayer patterns. Our approach leverages advancements in neural fields to perform demosaicking by representing an image as a coordinate-based neural network with sine activation functions. The inputs to the network are spatial coordinates and a low-resolution Bayer pattern, while the outputs are the corresponding RGB values. An encoder network, which is a blend of ResNet and U-net, enhances the implicit neural representation of the image to improve its quality and ensure spatial consistency through prior learning. Our experimental results demonstrate that NeRD outperforms traditional and state-of-the-art CNN-based methods and significantly closes the gap to transformer-based methods.
Abstract:Blur is an image degradation that is difficult to remove. Invariants with respect to blur offer an alternative way of a~description and recognition of blurred images without any deblurring. In this paper, we present an original unified theory of blur invariants. Unlike all previous attempts, the new theory does not require any prior knowledge of the blur type. The invariants are constructed in the Fourier domain by means of orthogonal projection operators and moment expansion is used for efficient and stable computation. It is shown that all blur invariants published earlier are just particular cases of this approach. Experimental comparison to concurrent approaches shows the advantages of the proposed theory.