Abstract:In this paper, we present a novel deep image clustering approach termed PICI, which enforces the partial information discrimination and the cross-level interaction in a joint learning framework. In particular, we leverage a Transformer encoder as the backbone, through which the masked image modeling with two paralleled augmented views is formulated. After deriving the class tokens from the masked images by the Transformer encoder, three partial information learning modules are further incorporated, including the PISD module for training the auto-encoder via masked image reconstruction, the PICD module for employing two levels of contrastive learning, and the CLI module for mutual interaction between the instance-level and cluster-level subspaces. Extensive experiments have been conducted on six real-world image datasets, which demononstrate the superior clustering performance of the proposed PICI approach over the state-of-the-art deep clustering approaches. The source code is available at https://github.com/Regan-Zhang/PICI.
Abstract:Multi-view subspace clustering aims to discover the hidden subspace structures from multiple views for robust clustering, and has been attracting considerable attention in recent years. Despite significant progress, most of the previous multi-view subspace clustering algorithms are still faced with two limitations. First, they usually focus on the consistency (or commonness) of multiple views, yet often lack the ability to capture the cross-view inconsistencies in subspace representations. Second, many of them overlook the local structures of multiple views and cannot jointly leverage multiple local structures to enhance the subspace representation learning. To address these two limitations, in this paper, we propose a jointly smoothed multi-view subspace clustering (JSMC) approach. Specifically, we simultaneously incorporate the cross-view commonness and inconsistencies into the subspace representation learning. The view-consensus grouping effect is presented to jointly exploit the local structures of multiple views to regularize the view-commonness representation, which is further associated with the low-rank constraint via the nuclear norm to strengthen its cluster structure. Thus the cross-view commonness and inconsistencies, the view-consensus grouping effect, and the low-rank representation are seamlessly incorporated into a unified objective function, upon which an alternating optimization algorithm is performed to achieve a robust subspace representation for clustering. Experimental results on a variety of real-world multi-view datasets have confirmed the superiority of the proposed approach.