Abstract:Despite tremendous advancements in Artificial Intelligence, learning from large sets of data in an unsupervised manner remains a significant challenge. Classical clustering algorithms often fail to discover complex dependencies in large datasets, especially considering sparse, high-dimensional spaces. However, deep learning techniques proved to be successful when dealing with large quantities of data, efficiently reducing their dimensionality without losing track of underlying information. Several interesting advancements have already been made to combine deep learning and clustering. Still, the idea of enhancing the clustering results by combining multiple views of the data generated by deep neural networks appears to be insufficiently explored yet. This paper aims to investigate this direction and bridge the gap between deep neural networks, clustering techniques and ensemble learning methods. To achieve this goal, we propose a novel deep clustering ensemble method - Snapshot Spectral Clustering, designed to maximize the gain from combining multiple data views while minimizing the computational costs of creating the ensemble. Comparative analysis and experiments described in this paper prove the proposed concept, while the conducted hyperparameter study provides a valuable intuition to follow when selecting proper values.
Abstract:The paper presents a cursory examination of clustering, focusing on a rarely explored field of hierarchy of clusters. Based on this, a short discussion of clustering quality measures is presented and the F-score measure is examined more deeply. As there are no attempts to assess the quality for hierarchies of clusters, three variants of the F-Score based index are presented: classic, hierarchical and partial order. The partial order index is the authors' approach to the subject. Conducted experiments show the properties of the considered measures. In conclusions, the strong and weak sides of each variant are presented.