https://github.com/dengxiaozhi/HTCN.
Contrastive deep clustering has recently gained significant attention with its ability of joint contrastive learning and clustering via deep neural networks. Despite the rapid progress, previous works mostly require both positive and negative sample pairs for contrastive clustering, which rely on a relative large batch-size. Moreover, they typically adopt a two-stream architecture with two augmented views, which overlook the possibility and potential benefits of multi-stream architectures (especially with heterogeneous or hybrid networks). In light of this, this paper presents a new end-to-end deep clustering approach termed Heterogeneous Tri-stream Clustering Network (HTCN). The tri-stream architecture in HTCN consists of three main components, including two weight-sharing online networks and a target network, where the parameters of the target network are the exponential moving average of that of the online networks. Notably, the two online networks are trained by simultaneously (i) predicting the instance representations of the target network and (ii) enforcing the consistency between the cluster representations of the target network and that of the two online networks. Experimental results on four challenging image datasets demonstrate the superiority of HTCN over the state-of-the-art deep clustering approaches. The code is available at