Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Uniformly Distributed Category Prototype-Guided Vision-Language Framework for Long-Tail Recognition

Aug 24, 2023

Siming Fu, Xiaoxuan He, Xinpeng Ding, Yuchen Cao, Hualiang Wang

Figure 1 for Uniformly Distributed Category Prototype-Guided Vision-Language Framework for Long-Tail Recognition

Figure 2 for Uniformly Distributed Category Prototype-Guided Vision-Language Framework for Long-Tail Recognition

Figure 3 for Uniformly Distributed Category Prototype-Guided Vision-Language Framework for Long-Tail Recognition

Figure 4 for Uniformly Distributed Category Prototype-Guided Vision-Language Framework for Long-Tail Recognition

Share this with someone who'll enjoy it:

Abstract:Recently, large-scale pre-trained vision-language models have presented benefits for alleviating class imbalance in long-tailed recognition. However, the long-tailed data distribution can corrupt the representation space, where the distance between head and tail categories is much larger than the distance between two tail categories. This uneven feature space distribution causes the model to exhibit unclear and inseparable decision boundaries on the uniformly distributed test set, which lowers its performance. To address these challenges, we propose the uniformly category prototype-guided vision-language framework to effectively mitigate feature space bias caused by data imbalance. Especially, we generate a set of category prototypes uniformly distributed on a hypersphere. Category prototype-guided mechanism for image-text matching makes the features of different classes converge to these distinct and uniformly distributed category prototypes, which maintain a uniform distribution in the feature space, and improve class boundaries. Additionally, our proposed irrelevant text filtering and attribute enhancement module allows the model to ignore irrelevant noisy text and focus more on key attribute information, thereby enhancing the robustness of our framework. In the image recognition fine-tuning stage, to address the positive bias problem of the learnable classifier, we design the class feature prototype-guided classifier, which compensates for the performance of tail classes while maintaining the performance of head classes. Our method outperforms previous vision-language methods for long-tailed learning work by a large margin and achieves state-of-the-art performance.

* 11pages, 5figures

View paper on

Share this with someone who'll enjoy it:

Title:Uniformly Distributed Category Prototype-Guided Vision-Language Framework for Long-Tail Recognition

Paper and Code