Abstract:Logical anomalies (LA) refer to data violating underlying logical constraints e.g., the quantity, arrangement, or composition of components within an image. Detecting accurately such anomalies requires models to reason about various component types through segmentation. However, curation of pixel-level annotations for semantic segmentation is both time-consuming and expensive. Although there are some prior few-shot or unsupervised co-part segmentation algorithms, they often fail on images with industrial object. These images have components with similar textures and shapes, and a precise differentiation proves challenging. In this study, we introduce a novel component segmentation model for LA detection that leverages a few labeled samples and unlabeled images sharing logical constraints. To ensure consistent segmentation across unlabeled images, we employ a histogram matching loss in conjunction with an entropy loss. As segmentation predictions play a crucial role, we propose to enhance both local and global sample validity detection by capturing key aspects from visual semantics via three memory banks: class histograms, component composition embeddings and patch-level representations. For effective LA detection, we propose an adaptive scaling strategy to standardize anomaly scores from different memory banks in inference. Extensive experiments on the public benchmark MVTec LOCO AD reveal our method achieves 98.1% AUROC in LA detection vs. 89.6% from competing methods.
Abstract:Few-shot classification is a challenging problem that aims to learn a model that can adapt to unseen classes given a few labeled samples. Recent approaches pre-train a feature extractor, and then fine-tune for episodic meta-learning. Other methods leverage spatial features to learn pixel-level correspondence while jointly training a classifier. However, results using such approaches show marginal improvements. In this paper, inspired by the transformer style self-attention mechanism, we propose a strategy to cross-attend and re-weight discriminative features for few-shot classification. Given a base representation of support and query images after global pooling, we introduce a single shared module that projects features and cross-attends in two aspects: (i) query to support, and (ii) support to query. The module computes attention scores between features to produce an attention pooled representation of features in the same class that is later added to the original representation followed by a projection head. This effectively re-weights features in both aspects (i & ii) to produce features that better facilitate improved metric-based meta-learning. Extensive experiments on public benchmarks show our approach outperforms state-of-the-art methods by 3%~5%.
Abstract:Few shot segmentation (FSS) aims to learn pixel-level classification of a target object in a query image using only a few annotated support samples. This is challenging as it requires modeling appearance variations of target objects and the diverse visual cues between query and support images with limited information. To address this problem, we propose a semi-supervised FSS strategy that leverages additional prototypes from unlabeled images with uncertainty guided pseudo label refinement. To obtain reliable prototypes from unlabeled images, we meta-train a neural network to jointly predict segmentation and estimate the uncertainty of predictions. We employ the uncertainty estimates to exclude predictions with high degrees of uncertainty for pseudo label construction to obtain additional prototypes based on the refined pseudo labels. During inference, query segmentation is predicted using prototypes from both support and unlabeled images including low-level features of the query images. Our approach is end-to-end and can easily supplement existing approaches without the requirement of additional training to employ unlabeled samples. Extensive experiments on PASCAL-$5^i$ and COCO-$20^i$ demonstrate that our model can effectively remove unreliable predictions to refine pseudo labels and significantly improve upon state-of-the-art performances.
Abstract:Non-rigid registration is a necessary but challenging task in medical imaging studies. Recently, unsupervised registration models have shown good performance, but they often require a large-scale training dataset and long training times. Therefore, in real world application where only dozens to hundreds of image pairs are available, existing models cannot be practically used. To address these limitations, we propose a novel unsupervised registration model which is integrated with a gradient-based meta learning framework. In particular, we train a meta learner which finds an initialization point of parameters by utilizing a variety of existing registration datasets. To quickly adapt to various tasks, the meta learner was updated to get close to the center of parameters which are fine-tuned for each registration task. Thereby, our model can adapt to unseen domain tasks via a short fine-tuning process and perform accurate registration. To verify the superiority of our model, we train the model for various 2D medical image registration tasks such as retinal choroid Optical Coherence Tomography Angiography (OCTA), CT organs, and brain MRI scans and test on registration of retinal OCTA Superficial Capillary Plexus (SCP). In our experiments, the proposed model obtained significantly improved performance in terms of accuracy and training time compared to other registration models.
Abstract:Segmentation of organs of interest in 3D medical images is necessary for accurate diagnosis and longitudinal studies. Though recent advances using deep learning have shown success for many segmentation tasks, large datasets are required for high performance and the annotation process is both time consuming and labor intensive. In this paper, we propose a 3D few shot segmentation framework for accurate organ segmentation using limited training samples of the target organ annotation. To achieve this, a U-Net like network is designed to predict segmentation by learning the relationship between 2D slices of support data and a query image, including a bidirectional gated recurrent unit (GRU) that learns consistency of encoded features between adjacent slices. Also, we introduce a transfer learning method to adapt the characteristics of the target image and organ by updating the model before testing with arbitrary support and query data sampled from the support data. We evaluate our proposed model using three 3D CT datasets with annotations of different organs. Our model yielded significantly improved performance over state-of-the-art few shot segmentation models and was comparable to a fully supervised model trained with more target training data.
Abstract:Brain-Computer Interfaces (BCI) based on Electroencephalography (EEG) signals, in particular motor imagery (MI) data have received a lot of attention and show the potential towards the design of key technologies both in healthcare and other industries. MI data is generated when a subject imagines movement of limbs and can be used to aid rehabilitation as well as in autonomous driving scenarios. Thus, classification of MI signals is vital for EEG-based BCI systems. Recently, MI EEG classification techniques using deep learning have shown improved performance over conventional techniques. However, due to inter-subject variability, the scarcity of unseen subject data, and low signal-to-noise ratio, extracting robust features and improving accuracy is still challenging. In this context, we propose a novel two-way few shot network that is able to efficiently learn how to learn representative features of unseen subject categories and how to classify them with limited MI EEG data. The pipeline includes an embedding module that learns feature representations from a set of samples, an attention mechanism for key signal feature discovery, and a relation module for final classification based on relation scores between a support set and a query signal. In addition to the unified learning of feature similarity and a few shot classifier, our method leads to emphasize informative features in support data relevant to the query data, which generalizes better on unseen subjects. For evaluation, we used the BCI competition IV 2b dataset and achieved an 9.3% accuracy improvement in the 20-shot classification task with state-of-the-art performance. Experimental results demonstrate the effectiveness of employing attention and the overall generality of our method.