Abstract:Attributed networks containing entity-specific information in node attributes are ubiquitous in modeling social networks, e-commerce, bioinformatics, etc. Their inherent network topology ranges from simple graphs to hypergraphs with high-order interactions and multiplex graphs with separate layers. An important graph mining task is node clustering, aiming to partition the nodes of an attributed network into k disjoint clusters such that intra-cluster nodes are closely connected and share similar attributes, while inter-cluster nodes are far apart and dissimilar. It is highly challenging to capture multi-hop connections via nodes or attributes for effective clustering on multiple types of attributed networks. In this paper, we first present AHCKA as an efficient approach to attributed hypergraph clustering (AHC). AHCKA includes a carefully-crafted K-nearest neighbor augmentation strategy for the optimized exploitation of attribute information on hypergraphs, a joint hypergraph random walk model to devise an effective AHC objective, and an efficient solver with speedup techniques for the objective optimization. The proposed techniques are extensible to various types of attributed networks, and thus, we develop ANCKA as a versatile attributed network clustering framework, capable of attributed graph clustering (AGC), attributed multiplex graph clustering (AMGC), and AHC. Moreover, we devise ANCKA with algorithmic designs tailored for GPU acceleration to boost efficiency. We have conducted extensive experiments to compare our methods with 19 competitors on 8 attributed hypergraphs, 16 competitors on 6 attributed graphs, and 16 competitors on 3 attributed multiplex graphs, all demonstrating the superb clustering quality and efficiency of our methods.
Abstract:Attributed bipartite graphs (ABGs) are an expressive data model for describing the interactions between two sets of heterogeneous nodes that are associated with rich attributes, such as customer-product purchase networks and author-paper authorship graphs. Partitioning the target node set in such graphs into k disjoint clusters (referred to as k-ABGC) finds widespread use in various domains, including social network analysis, recommendation systems, information retrieval, and bioinformatics. However, the majority of existing solutions towards k-ABGC either overlook attribute information or fail to capture bipartite graph structures accurately, engendering severely compromised result quality. The severity of these issues is accentuated in real ABGs, which often encompass millions of nodes and a sheer volume of attribute data, rendering effective k-ABGC over such graphs highly challenging. In this paper, we propose TPO, an effective and efficient approach to k-ABGC that achieves superb clustering performance on multiple real datasets. TPO obtains high clustering quality through two major contributions: (i) a novel formulation and transformation of the k-ABGC problem based on multi-scale attribute affinity specialized for capturing attribute affinities between nodes with the consideration of their multi-hop connections in ABGs, and (ii) a highly efficient solver that includes a suite of carefully-crafted optimizations for sidestepping explicit affinity matrix construction and facilitating faster convergence. Extensive experiments, comparing TPO against 19 baselines over 5 real ABGs, showcase the superior clustering quality of TPO measured against ground-truth labels. Moreover, compared to the state of the arts, TPO is often more than 40x faster over both small and large ABGs.
Abstract:Heterogeneous graphs are ubiquitous to model complex data. There are urgent needs on powerful heterogeneous graph neural networks to effectively support important applications. We identify a potential semantic mixing issue in existing message passing processes, where the representations of the neighbors of a node $v$ are forced to be transformed to the feature space of $v$ for aggregation, though the neighbors are in different types. That is, the semantics in different node types are entangled together into node $v$'s representation. To address the issue, we propose SlotGAT with separate message passing processes in slots, one for each node type, to maintain the representations in their own node-type feature spaces. Moreover, in a slot-based message passing layer, we design an attention mechanism for effective slot-wise message aggregation. Further, we develop a slot attention technique after the last layer of SlotGAT, to learn the importance of different slots in downstream tasks. Our analysis indicates that the slots in SlotGAT can preserve different semantics in various feature spaces. The superiority of SlotGAT is evaluated against 13 baselines on 6 datasets for node classification and link prediction. Our code is at https://github.com/scottjiao/SlotGAT_ICML23/.
Abstract:In this letter, we investigate enhancing the physical layer security (PLS) for the dual-function radar-communication (DFRC) system with hybrid beamforming (HBF) architecture, where the base station (BS) achieves downlink communication and radar target detection simultaneously. We consider an eavesdropper intercepting the information transmitted from the BS to the downlink communication users with imperfectly known channel state information. Additionally, the location of the radar target is also imperfectly known by the BS. To enhance PLS in the considered DFRC system, we propose a novel HBF architecture, which introduces a new integrated sensing and security (I2S) symbol. The secure HBF design problem for DFRC is formulated by maximizing the minimum legitimate user communication rate subject to radar interference-plus-noise ratio, eavesdropping rate, hardware and power constraints. To solve this non-convex problem, we propose an alternating optimization based method to jointly optimize transmit and receive beamformers. Numerical simulation results validate the effectiveness of the proposed algorithm and show the superiority of the proposed I2S-aided HBF architecture for achieving DFRC and enhancing PLS.
Abstract:A bipartite graph contains inter-set edges between two disjoint vertex sets, and is widely used to model real-world data, such as user-item purchase records, author-article publications, and biological interactions between drugs and proteins. k-Bipartite Graph Clustering (k-BGC) is to partition the target vertex set in a bipartite graph into k disjoint clusters. The clustering quality is important to the utility of k-BGC in various applications like social network analysis, recommendation systems, text mining, and bioinformatics, to name a few. Existing approaches to k-BGC either output clustering results with compromised quality due to inadequate exploitation of high-order information between vertices, or fail to handle sizable bipartite graphs with billions of edges. Motivated by this, this paper presents two efficient k-BGC solutions, HOPE and HOPE+, which achieve state-of-the-art performance on large-scale bipartite graphs. HOPE obtains high scalability and effectiveness through a new k-BGC problem formulation based on the novel notion of high-order perspective (HOP) vectors and an efficient technique for low-rank approximation of HOP vectors. HOPE+ further elevates the k-BGC performance to another level with a judicious problem transformation and a highly efficient two-stage optimization framework. Two variants, HOPE+ (FNEM) and HOPE+ (SNEM) are designed when either the Frobenius norm or spectral norm is applied in the transformation. Extensive experiments, comparing HOPE and HOPE+ against 13 competitors on 10 real-world datasets, exhibit that our solutions, especially HOPE+, are superior to existing methods in terms of result quality, while being up to orders of magnitude faster. On the largest dataset MAG with 1.1 billion edges, HOPE+ is able to produce clusters with the highest clustering accuracy within 31 minutes, which is unmatched by any existing solution for k-BGC.
Abstract:Graph-level representation learning is important in a wide range of applications. However, existing graph-level models are generally built on i.i.d. assumption for both training and testing graphs, which is not realistic in an open world, where models can encounter out-of-distribution (OOD) testing graphs that are from different distributions unknown during training. A trustworthy model should not only produce accurate predictions for in-distribution (ID) data, but also detect OOD graphs to avoid unreliable prediction. In this paper, we present SGOOD, a novel graph-level OOD detection framework. We find that substructure differences commonly exist between ID and OOD graphs. Hence, SGOOD explicitly utilizes substructures to learn powerful representations to achieve superior performance. Specifically, we build a super graph of substructures for every graph, and design a two-level graph encoding pipeline that works on both original graphs and super graphs to obtain substructure-enhanced graph representations. To further distinguish ID and OOD graphs, we develop three graph augmentation techniques that preserve substructures and increase expressiveness. Extensive experiments against 10 competitors on numerous graph datasets demonstrate the superiority of SGOOD, often surpassing existing methods by a significant margin. The code is available at https://anonymous.4open.science/r/SGOOD-0958.
Abstract:We study illicit account detection on transaction networks of cryptocurrencies that are increasi_testngly important in online financial markets. The surge of illicit activities on cryptocurrencies has resulted in billions of losses from normal users. Existing solutions either rely on tedious feature engineering to get handcrafted features, or are inadequate to fully utilize the rich semantics of cryptocurrency transaction data, and consequently, yield sub-optimal performance. In this paper, we formulate the illicit account detection problem as a classification task over directed multigraphs with edge attributes, and present DIAM, a novel multi-graph neural network model to effectively detect illicit accounts on large transaction networks. First, DIAM includes an Edge2Seq module that automatically learns effective node representations preserving intrinsic transaction patterns of parallel edges, by considering both edge attributes and directed edge sequence dependencies. Then utilizing the multigraph topology, DIAM employs a new Multigraph Discrepancy (MGD) module with a well-designed message passing mechanism to capture the discrepant features between normal and illicit nodes, supported by an attention mechanism. Assembling all techniques, DIAM is trained in an end-to-end manner. Extensive experiments, comparing against 14 existing solutions on 4 large cryptocurrency datasets of Bitcoin and Ethereum, demonstrate that DIAM consistently achieves the best performance to accurately detect illicit accounts, while being efficient. For instance, on a Bitcoin dataset with 20 million nodes and 203 million edges, DIAM achieves F1 score 96.55%, significantly higher than the F1 score 83.92% of the best competitor.
Abstract:The rise of the click farm business using Multi-purpose Messaging Mobile Apps (MMMAs) tempts cybercriminals to perpetrate crowdsourcing frauds that cause financial losses to click farm workers. In this paper, we propose a novel contrastive multi-view learning method named CMT for crowdsourcing fraud detection over the heterogeneous temporal graph (HTG) of MMMA. CMT captures both heterogeneity and dynamics of HTG and generates high-quality representations for crowdsourcing fraud detection in a self-supervised manner. We deploy CMT to detect crowdsourcing frauds on an industry-size HTG of a representative MMMA WeChat and it significantly outperforms other methods. CMT also shows promising results for fraud detection on a large-scale public financial HTG, indicating that it can be applied in other graph anomaly detection tasks.
Abstract:Graph anomaly detection (GAD) has gained increasing attention in recent years due to its critical application in a wide range of domains, such as social networks, financial risk management, and traffic analysis. Existing GAD methods can be categorized into node and edge anomaly detection models based on the type of graph objects being detected. However, these methods typically treat node and edge anomalies as separate tasks, overlooking their associations and frequent co-occurrences in real-world graphs. As a result, they fail to leverage the complementary information provided by node and edge anomalies for mutual detection. Additionally, state-of-the-art GAD methods, such as CoLA and SL-GAD, heavily rely on negative pair sampling in contrastive learning, which incurs high computational costs, hindering their scalability to large graphs. To address these limitations, we propose a novel unified graph anomaly detection framework based on bootstrapped self-supervised learning (named BOURNE). We extract a subgraph (graph view) centered on each target node as node context and transform it into a dual hypergraph (hypergraph view) as edge context. These views are encoded using graph and hypergraph neural networks to capture the representations of nodes, edges, and their associated contexts. By swapping the context embeddings between nodes and edges and measuring the agreement in the embedding space, we enable the mutual detection of node and edge anomalies. Furthermore, we adopt a bootstrapped training strategy that eliminates the need for negative sampling, enabling BOURNE to handle large graphs efficiently. Extensive experiments conducted on six benchmark datasets demonstrate the superior effectiveness and efficiency of BOURNE in detecting both node and edge anomalies.
Abstract:Detecting anomalous trajectories has become an important task in many location-based applications. While many approaches have been proposed for this task, they suffer from various issues including (1) incapability of detecting anomalous subtrajectories, which are finer-grained anomalies in trajectory data, and/or (2) non-data driven, and/or (3) requirement of sufficient supervision labels which are costly to collect. In this paper, we propose a novel reinforcement learning based solution called RL4OASD, which avoids all aforementioned issues of existing approaches. RL4OASD involves two networks, one responsible for learning features of road networks and trajectories and the other responsible for detecting anomalous subtrajectories based on the learned features, and the two networks can be trained iteratively without labeled data. Extensive experiments are conducted on two real datasets, and the results show that our solution can significantly outperform the state-of-the-art methods (with 20-30% improvement) and is efficient for online detection (it takes less than 0.1ms to process each newly generated data point).