Abstract:In federated clustering, multiple data-holding clients collaboratively group data without exchanging raw data. This field has seen notable advancements through its marriage with contrastive learning, exemplified by Cluster-Contrastive Federated Clustering (CCFC). However, CCFC suffers from heterogeneous data across clients, leading to poor and unrobust performance. Our study conducts both empirical and theoretical analyses to understand the impact of heterogeneous data on CCFC. Findings indicate that increased data heterogeneity exacerbates dimensional collapse in CCFC, evidenced by increased correlations across multiple dimensions of the learned representations. To address this, we introduce a decorrelation regularizer to CCFC. Benefiting from the regularizer, the improved method effectively mitigates the detrimental effects of data heterogeneity, and achieves superior performance, as evidenced by a marked increase in NMI scores, with the gain reaching as high as 0.32 in the most pronounced case.