Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:DUET: Cross-modal Semantic Grounding for Contrastive Zero-shot Learning

Jul 04, 2022

Zhuo Chen, Yufeng Huang, Jiaoyan Chen, Yuxia Geng, Wen Zhang, Yin Fang, Jeff Z. Pan, Wenting Song, Huajun Chen

Figure 1 for DUET: Cross-modal Semantic Grounding for Contrastive Zero-shot Learning

Figure 2 for DUET: Cross-modal Semantic Grounding for Contrastive Zero-shot Learning

Figure 3 for DUET: Cross-modal Semantic Grounding for Contrastive Zero-shot Learning

Figure 4 for DUET: Cross-modal Semantic Grounding for Contrastive Zero-shot Learning

Share this with someone who'll enjoy it:

Abstract:Zero-shot learning (ZSL) aims to predict unseen classes whose samples have never appeared during training, often utilizing additional semantic information (a.k.a. side information) to bridge the training (seen) classes and the unseen classes. One of the most effective and widely used semantic information for zero-shot image classification are attributes which are annotations for class-level visual characteristics. However, due to the shortage of fine-grained annotations, the attribute imbalance and co-occurrence, the current methods often fail to discriminate those subtle visual distinctions between images, which limits their performances. In this paper, we present a transformer-based end-to-end ZSL method named DUET, which integrates latent semantic knowledge from the pretrained language models (PLMs) via a self-supervised multi-modal learning paradigm. Specifically, we (1) developed a cross-modal semantic grounding network to investigate the model's capability of disentangling semantic attributes from the images, (2) applied an attribute-level contrastive learning strategy to further enhance the model's discrimination on fine-grained visual characteristics against the attribute co-occurrence and imbalance, and (3) proposed a multi-task learning policy for considering multi-model objectives. With extensive experiments on three standard ZSL benchmarks and a knowledge graph equipped ZSL benchmark, we find that DUET can often achieve state-of-the-art performance, its components are effective and its predictions are interpretable.

* Work in progress

View paper on

Share this with someone who'll enjoy it:

Title:DUET: Cross-modal Semantic Grounding for Contrastive Zero-shot Learning

Paper and Code