Abstract:This paper presents DeepDelveAI, a comprehensive dataset specifically curated to identify AI-related research papers from a large-scale academic literature database. The dataset was created using an advanced Long Short-Term Memory (LSTM) model trained on a binary classification task to distinguish between AI-related and non-AI-related papers. The model was trained and validated on a vast dataset, achieving high accuracy, precision, recall, and F1-score. The resulting DeepDelveAI dataset comprises over 9.4 million AI-related papers published since Dartmouth Conference, from 1956 to 2024, providing a crucial resource for analyzing trends, thematic developments, and the evolution of AI research across various disciplines.
Abstract:With the recent adoption of laws supporting the ``right to be forgotten'' and the widespread use of Graph Neural Networks for modeling graph-structured data, graph unlearning has emerged as a crucial research area. Current studies focus on the efficient update of model parameters. However, they often overlook the time-consuming re-computation of graph propagation required for each removal, significantly limiting their scalability on large graphs. In this paper, we present ScaleGUN, the first certifiable graph unlearning mechanism that scales to billion-edge graphs. ScaleGUN employs a lazy local propagation method to facilitate efficient updates of the embedding matrix during data removal. Such lazy local propagation can be proven to ensure certified unlearning under all three graph unlearning scenarios, including node feature, edge, and node unlearning. Extensive experiments on real-world datasets demonstrate the efficiency and efficacy of ScaleGUN. Remarkably, ScaleGUN accomplishes $(\epsilon,\delta)=(1,10^{-4})$ certified unlearning on the billion-edge graph ogbn-papers100M in 20 seconds for a $5K$-random-edge removal request -- of which only 5 seconds are required for updating the embedding matrix -- compared to 1.91 hours for retraining and 1.89 hours for re-propagation. Our code is available online.
Abstract:Graph neural networks (GNNs) have emerged as a powerful tool for effectively mining and learning from graph-structured data, with applications spanning numerous domains. However, most research focuses on static graphs, neglecting the dynamic nature of real-world networks where topologies and attributes evolve over time. By integrating sequence modeling modules into traditional GNN architectures, dynamic GNNs aim to bridge this gap, capturing the inherent temporal dependencies of dynamic graphs for a more authentic depiction of complex networks. This paper provides a comprehensive review of the fundamental concepts, key techniques, and state-of-the-art dynamic GNN models. We present the mainstream dynamic GNN models in detail and categorize models based on how temporal information is incorporated. We also discuss large-scale dynamic GNNs and pre-training techniques. Although dynamic GNNs have shown superior performance, challenges remain in scalability, handling heterogeneous information, and lack of diverse graph datasets. The paper also discusses possible future directions, such as adaptive and memory-enhanced models, inductive learning, and theoretical analysis.