Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Image Clustering via the Principle of Rate Reduction in the Age of Pretrained Models

Jun 09, 2023

Tianzhe Chu, Shengbang Tong, Tianjiao Ding, Xili Dai, Benjamin David Haeffele, René Vidal, Yi Ma

Figure 1 for Image Clustering via the Principle of Rate Reduction in the Age of Pretrained Models

Figure 2 for Image Clustering via the Principle of Rate Reduction in the Age of Pretrained Models

Figure 3 for Image Clustering via the Principle of Rate Reduction in the Age of Pretrained Models

Figure 4 for Image Clustering via the Principle of Rate Reduction in the Age of Pretrained Models

Share this with someone who'll enjoy it:

Abstract:The advent of large pre-trained models has brought about a paradigm shift in both visual representation learning and natural language processing. However, clustering unlabeled images, as a fundamental and classic machine learning problem, still lacks effective solution, particularly for large-scale datasets. In this paper, we propose a novel image clustering pipeline that leverages the powerful feature representation of large pre-trained models such as CLIP and cluster images effectively and efficiently at scale. We show that the pre-trained features are significantly more structured by further optimizing the rate reduction objective. The resulting features may significantly improve the clustering accuracy, e.g., from 57\% to 66\% on ImageNet-1k. Furthermore, by leveraging CLIP's image-text binding, we show how the new clustering method leads to a simple yet effective self-labeling algorithm that successfully works on unlabeled large datasets such as MS-COCO and LAION-Aesthetics. We will release the code in https://github.com/LeslieTrue/CPP.

* 21 pages, 13 figures

View paper on

Share this with someone who'll enjoy it:

Title:Image Clustering via the Principle of Rate Reduction in the Age of Pretrained Models

Paper and Code