Abstract:The Artificial Benchmark for Community Detection (ABCD) graph is a random graph model with community structure and power-law distribution for both degrees and community sizes. The model generates graphs similar to the well-known LFR model but it is faster and can be investigated analytically. In this paper, we show that the ABCD model exhibits some interesting self-similar behaviour, namely, the degree distribution of ground-truth communities is asymptotically the same as the degree distribution of the whole graph (appropriately normalized based on their sizes). As a result, we can not only estimate the number of edges induced by each community but also the number of self-loops and multi-edges generated during the process. Understanding these quantities is important as (a) rewiring self-loops and multi-edges to keep the graph simple is an expensive part of the algorithm, and (b) every rewiring causes the underlying configuration models to deviate slightly from uniform simple graphs on their corresponding degree sequences.
Abstract:An embedding is a mapping from a set of nodes of a network into a real vector space. Embeddings can have various aims like capturing the underlying graph topology and structure, node-to-node relationship, or other relevant information about the graph, its subgraphs or nodes themselves. A practical challenge with using embeddings is that there are many available variants to choose from. Selecting a small set of most promising embeddings from the long list of possible options for a given task is challenging and often requires domain expertise. Embeddings can be categorized into two main types: classical embeddings and structural embeddings. Classical embeddings focus on learning both local and global proximity of nodes, while structural embeddings learn information specifically about the local structure of nodes' neighbourhood. For classical node embeddings there exists a framework which helps data scientists to identify (in an unsupervised way) a few embeddings that are worth further investigation. Unfortunately, no such framework exists for structural embeddings. In this paper we propose a framework for unsupervised ranking of structural graph embeddings. The proposed framework, apart from assigning an aggregate quality score for a structural embedding, additionally gives a data scientist insights into properties of this embedding. It produces information which predefined node features the embedding learns, how well it learns them, and which dimensions in the embedded space represent the predefined node features. Using this information the user gets a level of explainability to an otherwise complex black-box embedding algorithm.
Abstract:The Artificial Benchmark for Community Detection (ABCD) graph is a random graph model with community structure and power-law distribution for both degrees and community sizes. The model generates graphs with similar properties as the well-known LFR one, and its main parameter $\xi$ can be tuned to mimic its counterpart in the LFR model, the mixing parameter $\mu$. In this paper, we investigate various theoretical asymptotic properties of the ABCD model. In particular, we analyze the modularity function, arguably, the most important graph property of networks in the context of community detection. Indeed, the modularity function is often used to measure the presence of community structure in networks. It is also used as a quality function in many community detection algorithms, including the widely used Louvain algorithm.
Abstract:Graph embedding is a transformation of vertices of a graph into set of vectors. Good embeddings should capture the graph topology, vertex-to-vertex relationship, and other relevant information about graphs, subgraphs, and vertices. If these objectives are achieved, they are meaningful, understandable, and compressed representations of networks. They also provide more options and tools for data scientists as machine learning on graphs is still quite limited. Finally, vector operations are simpler and faster than comparable operations on graphs. The main challenge is that one needs to make sure that embeddings well describe the properties of the graphs. In particular, the decision has to be made on the embedding dimensionality which highly impacts the quality of an embedding. As a result, selecting the best embedding is a challenging task and very often requires domain experts. In this paper, we propose a ``divergence score'' that can be assign to various embeddings to distinguish good ones from bad ones. This general framework provides a tool for an unsupervised graph embedding comparison. In order to achieve it, we needed to generalize the well-known Chung-Lu model to incorporate geometry which is interesting on its own rights. In order to test our framework, we did a number of experiments with synthetic networks as well as real-world networks, and various embedding algorithms.
Abstract:We examine a version of the Cops and Robber (CR) game in which the robber is invisible, i.e., the cops do not know his location until they capture him. Apparently this game (CiR) has received little attention in the CR literature. We examine two variants: in the first the robber is adversarial (he actively tries to avoid capture); in the second he is drunk (he performs a random walk). Our goal in this paper is to study the invisible Cost of Drunkenness (iCOD), which is defined as the ratio ct_i(G)/dct_i(G), with ct_i(G) and dct_i(G) being the expected capture times in the adversarial and drunk CiR variants, respectively. We show that these capture times are well defined, using game theory for the adversarial case and partially observable Markov decision processes (POMDP) for the drunk case. We give exact asymptotic values of iCOD for several special graph families such as $d$-regular trees, give some bounds for grids, and provide general upper and lower bounds for general classes of graphs. We also give an infinite family of graphs showing that iCOD can be arbitrarily close to any value in [2,infinty). Finally, we briefly examine one more CiR variant, in which the robber is invisible and "infinitely fast"; we argue that this variant is significantly different from the Graph Search game, despite several similarities between the two games.
Abstract:The cops and robbers game has been extensively studied under the assumption of optimal play by both the cops and the robbers. In this paper we study the problem in which cops are chasing a drunk robber (that is, a robber who performs a random walk) on a graph. Our main goal is to characterize the "cost of drunkenness." Specifically, we study the ratio of expected capture times for the optimal version and the drunk robber one. We also examine the algorithmic side of the problem; that is, how to compute near-optimal search schedules for the cops. Finally, we present a preliminary investigation of the invisible robber game and point out differences between this game and graph search.