Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Image-Specific Information Suppression and Implicit Local Alignment for Text-based Person Search

Aug 30, 2022

Shuanglin Yan, Hao Tang, Liyan Zhang, Jinhui Tang

Figure 1 for Image-Specific Information Suppression and Implicit Local Alignment for Text-based Person Search

Figure 2 for Image-Specific Information Suppression and Implicit Local Alignment for Text-based Person Search

Figure 3 for Image-Specific Information Suppression and Implicit Local Alignment for Text-based Person Search

Figure 4 for Image-Specific Information Suppression and Implicit Local Alignment for Text-based Person Search

Share this with someone who'll enjoy it:

Abstract:Text-based person search is a challenging task that aims to search pedestrian images with the same identity from the image gallery given a query text description. In recent years, text-based person search has made good progress, and state-of-the-art methods achieve superior performance by learning local fine-grained correspondence between images and texts. However, the existing methods explicitly extract image parts and text phrases from images and texts by hand-crafted split or external tools and then conduct complex cross-modal local matching. Moreover, the existing methods seldom consider the problem of information inequality between modalities caused by image-specific information. In this paper, we propose an efficient joint Information and Semantic Alignment Network (ISANet) for text-based person search. Specifically, we first design an image-specific information suppression module, which suppresses image background and environmental factors by relation-guide localization and channel attention filtration respectively. This design can effectively alleviate the problem of information inequality and realize the information alignment between images and texts. Secondly, we propose an implicit local alignment module to adaptively aggregate image and text features to a set of modality-shared semantic topic centers, and implicitly learn the local fine-grained correspondence between images and texts without additional supervision information and complex cross-modal interactions. Moreover, a global alignment is introduced as a supplement to the local perspective. Extensive experiments on multiple databases demonstrate the effectiveness and superiority of the proposed ISANet.

View paper on

Share this with someone who'll enjoy it:

Title:Image-Specific Information Suppression and Implicit Local Alignment for Text-based Person Search

Paper and Code