Abstract:Language Bottleneck Models (LBMs) are proposed to achieve interpretable image recognition by classifying images based on textual concept bottlenecks. However, current LBMs simply list all concepts together as the bottleneck layer, leading to the spurious cue inference problem and cannot generalized to unseen classes. To address these limitations, we propose the Attribute-formed Language Bottleneck Model (ALBM). ALBM organizes concepts in the attribute-formed class-specific space, where concepts are descriptions of specific attributes for specific classes. In this way, ALBM can avoid the spurious cue inference problem by classifying solely based on the essential concepts of each class. In addition, the cross-class unified attribute set also ensures that the concept spaces of different classes have strong correlations, as a result, the learned concept classifier can be easily generalized to unseen classes. Moreover, to further improve interpretability, we propose Visual Attribute Prompt Learning (VAPL) to extract visual features on fine-grained attributes. Furthermore, to avoid labor-intensive concept annotation, we propose the Description, Summary, and Supplement (DSS) strategy to automatically generate high-quality concept sets with a complete and precise attribute. Extensive experiments on 9 widely used few-shot benchmarks demonstrate the interpretability, transferability, and performance of our approach. The code and collected concept sets are available at https://github.com/tiggers23/ALBM.
Abstract:Zero-Shot Learning (ZSL) aims to learn recognition models for recognizing new classes without labeled data. In this work, we propose a novel approach dubbed Transferrable Semantic-Visual Relation (TSVR) to facilitate the cross-category transfer in transductive ZSL. Our approach draws on an intriguing insight connecting two challenging problems, i.e. domain adaptation and zero-shot learning. Domain adaptation aims to transfer knowledge across two different domains (i.e., source domain and target domain) that share the identical task/label space. For ZSL, the source and target domains have different tasks/label spaces. Hence, ZSL is usually considered as a more difficult transfer setting compared with domain adaptation. Although the existing ZSL approaches use semantic attributes of categories to bridge the source and target domains, their performances are far from satisfactory due to the large domain gap between different categories. In contrast, our method directly transforms ZSL into a domain adaptation task through redrawing ZSL as predicting the similarity/dissimilarity labels for the pairs of semantic attributes and visual features. For this redrawn domain adaptation problem, we propose to use a domain-specific batch normalization component to reduce the domain discrepancy of semantic-visual pairs. Experimental results over diverse ZSL benchmarks clearly demonstrate the superiority of our method.