Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Exploring Part-Informed Visual-Language Learning for Person Re-Identification

Aug 04, 2023

Yin Lin, Cong Liu, Yehansen Chen, Jinshui Hu, Bing Yin, Baocai Yin, Zengfu Wang

Figure 1 for Exploring Part-Informed Visual-Language Learning for Person Re-Identification

Figure 2 for Exploring Part-Informed Visual-Language Learning for Person Re-Identification

Figure 3 for Exploring Part-Informed Visual-Language Learning for Person Re-Identification

Figure 4 for Exploring Part-Informed Visual-Language Learning for Person Re-Identification

Share this with someone who'll enjoy it:

Abstract:Recently, visual-language learning has shown great potential in enhancing visual-based person re-identification (ReID). Existing visual-language learning-based ReID methods often focus on whole-body scale image-text feature alignment, while neglecting supervisions on fine-grained part features. This choice simplifies the learning process but cannot guarantee within-part feature semantic consistency thus hindering the final performance. Therefore, we propose to enhance fine-grained visual features with part-informed language supervision for ReID tasks. The proposed method, named Part-Informed Visual-language Learning ($\pi$-VL), suggests that (i) a human parsing-guided prompt tuning strategy and (ii) a hierarchical fusion-based visual-language alignment paradigm play essential roles in ensuring within-part feature semantic consistency. Specifically, we combine both identity labels and parsing maps to constitute pixel-level text prompts and fuse multi-stage visual features with a light-weight auxiliary head to perform fine-grained image-text alignment. As a plug-and-play and inference-free solution, our $\pi$-VL achieves substantial improvements over previous state-of-the-arts on four common-used ReID benchmarks, especially reporting 90.3% Rank-1 and 76.5% mAP for the most challenging MSMT17 database without bells and whistles.

* 11 pages, 5 figures

View paper on

Share this with someone who'll enjoy it:

Title:Exploring Part-Informed Visual-Language Learning for Person Re-Identification

Paper and Code