Abstract:The presence of a large number of bots on social media has adverse effects. The graph neural network (GNN) can effectively leverage the social relationships between users and achieve excellent results in detecting bots. Recently, more and more GNN-based methods have been proposed for bot detection. However, the existing GNN-based bot detection methods only focus on low-frequency information and seldom consider high-frequency information, which limits the representation ability of the model. To address this issue, this paper proposes a Multi-scale with Signed-attention Graph Filter for social bot detection called MSGS. MSGS could effectively utilize both high and low-frequency information in the social graph. Specifically, MSGS utilizes a multi-scale structure to produce representation vectors at different scales. These representations are then combined using a signed-attention mechanism. Finally, multi-scale representations via MLP after polymerization to produce the final result. We analyze the frequency response and demonstrate that MSGS is a more flexible and expressive adaptive graph filter. MSGS can effectively utilize high-frequency information to alleviate the over-smoothing problem of deep GNNs. Experimental results on real-world datasets demonstrate that our method achieves better performance compared with several state-of-the-art social bot detection methods.
Abstract:The presence of a large number of bots on social media leads to adverse effects. Although Random forest algorithm is widely used in bot detection and can significantly enhance the performance of weak classifiers, it cannot utilize the interaction between accounts. This paper proposes a Random Forest boosted Graph Neural Network for social bot detection, called RF-GNN, which employs graph neural networks (GNNs) as the base classifiers to construct a random forest, effectively combining the advantages of ensemble learning and GNNs to improve the accuracy and robustness of the model. Specifically, different subgraphs are constructed as different training sets through node sampling, feature selection, and edge dropout. Then, GNN base classifiers are trained using various subgraphs, and the remaining features are used for training Fully Connected Netural Network (FCN). The outputs of GNN and FCN are aligned in each branch. Finally, the outputs of all branches are aggregated to produce the final result. Moreover, RF-GNN is compatible with various widely-used GNNs for node classification. Extensive experimental results demonstrate that the proposed method obtains better performance than other state-of-the-art methods.
Abstract:The presence of a large number of bots in Online Social Networks (OSN) leads to undesirable social effects. Graph neural networks (GNNs) have achieved state-of-the-art performance in bot detection since they can effectively utilize user interaction. In most scenarios, the distribution of bots and humans is imbalanced, resulting in under-represent minority class samples and sub-optimal performance. However, previous GNN-based methods for bot detection seldom consider the impact of class-imbalanced issues. In this paper, we propose an over-sampling strategy for GNN (OS-GNN) that can mitigate the effect of class imbalance in bot detection. Compared with previous over-sampling methods for GNNs, OS-GNN does not call for edge synthesis, eliminating the noise inevitably introduced during the edge construction. Specifically, node features are first mapped to a feature space through neighborhood aggregation and then generated samples for the minority class in the feature space. Finally, the augmented features are fed into GNNs to train the classifiers. This framework is general and can be easily extended into different GNN architectures. The proposed framework is evaluated using three real-world bot detection benchmark datasets, and it consistently exhibits superiority over the baselines.
Abstract:The development of social media user stance detection and bot detection methods rely heavily on large-scale and high-quality benchmarks. However, in addition to low annotation quality, existing benchmarks generally have incomplete user relationships, suppressing graph-based account detection research. To address these issues, we propose a Multi-Relational Graph-Based Twitter Account Detection Benchmark (MGTAB), the first standardized graph-based benchmark for account detection. To our knowledge, MGTAB was built based on the largest original data in the field, with over 1.55 million users and 130 million tweets. MGTAB contains 10,199 expert-annotated users and 7 types of relationships, ensuring high-quality annotation and diversified relations. In MGTAB, we extracted the 20 user property features with the greatest information gain and user tweet features as the user features. In addition, we performed a thorough evaluation of MGTAB and other public datasets. Our experiments found that graph-based approaches are generally more effective than feature-based approaches and perform better when introducing multiple relations. By analyzing experiment results, we identify effective approaches for account detection and provide potential future research directions in this field. Our benchmark and standardized evaluation procedures are freely available at: https://github.com/GraphDetec/MGTAB.