Abstract:The identification of key nodes in complex networks is an important topic in many network science areas. It is vital to a variety of real-world applications, including viral marketing, epidemic spreading and influence maximization. In recent years, machine learning algorithms have proven to outperform the conventional, centrality-based methods in accuracy and consistency, but this approach still requires further refinement. What information about the influencers can be extracted from the network? How can we precisely obtain the labels required for training? Can these models generalize well? In this paper, we answer these questions by presenting an enhanced machine learning-based framework for the influence spread problem. We focus on identifying key nodes for the Independent Cascade model, which is a popular reference method. Our main contribution is an improved process of obtaining the labels required for training by introducing 'Smart Bins' and proving their advantage over known methods. Next, we show that our methodology allows ML models to not only predict the influence of a given node, but to also determine other characteristics of the spreading process-which is another novelty to the relevant literature. Finally, we extensively test our framework and its ability to generalize beyond complex networks of different types and sizes, gaining important insight into the properties of these methods.
Abstract:Despite tremendous advancements in Artificial Intelligence, learning from large sets of data in an unsupervised manner remains a significant challenge. Classical clustering algorithms often fail to discover complex dependencies in large datasets, especially considering sparse, high-dimensional spaces. However, deep learning techniques proved to be successful when dealing with large quantities of data, efficiently reducing their dimensionality without losing track of underlying information. Several interesting advancements have already been made to combine deep learning and clustering. Still, the idea of enhancing the clustering results by combining multiple views of the data generated by deep neural networks appears to be insufficiently explored yet. This paper aims to investigate this direction and bridge the gap between deep neural networks, clustering techniques and ensemble learning methods. To achieve this goal, we propose a novel deep clustering ensemble method - Snapshot Spectral Clustering, designed to maximize the gain from combining multiple data views while minimizing the computational costs of creating the ensemble. Comparative analysis and experiments described in this paper prove the proposed concept, while the conducted hyperparameter study provides a valuable intuition to follow when selecting proper values.