Abstract:Confronting the critical challenge of insufficient training data in the field of complex image recognition, this paper introduces a novel 3D viewpoint augmentation technique specifically tailored for wine label recognition. This method enhances deep learning model performance by generating visually realistic training samples from a single real-world wine label image, overcoming the challenges posed by the intricate combinations of text and logos. Classical Generative Adversarial Network (GAN) methods fall short in synthesizing such intricate content combination. Our proposed solution leverages time-tested computer vision and image processing strategies to expand our training dataset, thereby broadening the range of training samples for deep learning applications. This innovative approach to data augmentation circumvents the constraints of limited training resources. Using the augmented training images through batch-all triplet metric learning on a Vision Transformer (ViT) architecture, we can get the most discriminative embedding features for every wine label, enabling us to perform one-shot recognition of existing wine labels in the training classes or future newly collected wine labels unavailable in the training. Experimental results show a significant increase in recognition accuracy over conventional 2D data augmentation techniques.
Abstract:In computer vision, finding correct point correspondence among images plays an important role in many applications, such as image stitching, image retrieval, visual localization, etc. Most of the research works focus on the matching of local feature before a sampling method is employed, such as RANSAC, to verify initial matching results via repeated fitting of certain global transformation among the images. However, incorrect matches may still exist. Thus, a novel sampling scheme, Pentagon-Match (PMatch), is proposed in this work to verify the correctness of initially matched keypoints using pentagons randomly sampled from them. By ensuring shape and location of these pentagons are view-invariant with various evaluations of cross-ratio (CR), incorrect matches of keypoint can be identified easily with homography estimated from correctly matched pentagons. Experimental results show that highly accurate estimation of homography can be obtained efficiently for planar scenes of the HPatches dataset, based on keypoint matching results provided by LoFTR. Besides, accurate outlier identification for the above matching results and possible extension of the approach for multi-plane situation are also demonstrated.
Abstract:In computer vision, finding point correspondence among images plays an important role in many applications, such as image stitching, image retrieval, visual localization, etc. Most of the research worksfocus on the matching of local feature before a sampling method is employed, such as RANSAC, to verify initial matching results via repeated fitting of certain global transformation among the images. However, incorrect matches may still exist, while careful examination of such problems is often skipped. Accordingly, a geometrically constrained algorithm is proposed in this work to verify the correctness of initially matched SIFT keypoints based on view-invariant cross-ratios (CRs). By randomly forming pentagons from these keypoints and matching their shape and location among images with CRs, robust planar region estimation can be achieved efficiently for the above verification, while correct and incorrect matches of keypoints can be examined easily with respect to those shape and location matched pentagons. Experimental results show that satisfactory results can be obtained for various scenes with single as well as multiple planar regions.