Abstract:Predicting crime hotspots in a city is a complex and critical task with significant societal implications. Numerous spatiotemporal correlations and irregularities pose substantial challenges to this endeavor. Existing methods commonly employ fixed-time granularities and sequence prediction models. However, determining appropriate time granularities is difficult, leading to inaccurate predictions for specific time windows. For example, users might ask: What are the crime hotspots during 12:00-20:00? To address this issue, we introduce FlexiCrime, a novel event-centric framework for predicting crime hotspots with flexible time intervals. FlexiCrime incorporates a continuous-time attention network to capture correlations between crime events, which learns crime context features, representing general crime patterns across time points and locations. Furthermore, we introduce a type-aware spatiotemporal point process that learns crime-evolving features, measuring the risk of specific crime types at a given time and location by considering the frequency of past crime events. The crime context and evolving features together allow us to predict whether an urban area is a crime hotspot given a future time interval. To evaluate FlexiCrime's effectiveness, we conducted experiments using real-world datasets from two cities, covering twelve crime types. The results show that our model outperforms baseline techniques in predicting crime hotspots over flexible time intervals.
Abstract:As graphs grow larger, full-batch GNN training becomes hard for single GPU memory. Therefore, to enhance the scalability of GNN training, some studies have proposed sampling-based mini-batch training and distributed graph learning. However, these methods still have drawbacks, such as performance degradation and heavy communication. This paper introduces SliceGCN, a feature-sliced distributed large-scale graph learning method. SliceGCN slices the node features, with each computing device, i.e., GPU, handling partial features. After each GPU processes its share, partial representations are obtained and concatenated to form complete representations, enabling a single GPU's memory to handle the entire graph structure. This aims to avoid the accuracy loss typically associated with mini-batch training (due to incomplete graph structures) and to reduce inter-GPU communication during message passing (the forward propagation process of GNNs). To study and mitigate potential accuracy reductions due to slicing features, this paper proposes feature fusion and slice encoding. Experiments were conducted on six node classification datasets, yielding some interesting analytical results. These results indicate that while SliceGCN does not enhance efficiency on smaller datasets, it does improve efficiency on larger datasets. Additionally, we found that SliceGCN and its variants have better convergence, feature fusion and slice encoding can make training more stable, reduce accuracy fluctuations, and this study also discovered that the design of SliceGCN has a potentially parameter-efficient nature.
Abstract:Urban region representation is crucial for various urban downstream tasks. However, despite the proliferation of methods and their success, acquiring general urban region knowledge and adapting to different tasks remains challenging. Previous work often neglects the spatial structures and functional layouts between entities, limiting their ability to capture transferable knowledge across regions. Further, these methods struggle to adapt effectively to specific downstream tasks, as they do not adequately address the unique features and relationships required for different downstream tasks. In this paper, we propose a $\textbf{G}$raph-based $\textbf{U}$rban $\textbf{R}$egion $\textbf{P}$re-training and $\textbf{P}$rompting framework ($\textbf{GURPP}$) for region representation learning. Specifically, we first construct an urban region graph that integrates detailed spatial entity data for more effective urban region representation. Then, we develop a subgraph-centric urban region pre-training model to capture the heterogeneous and transferable patterns of interactions among entities. To further enhance the adaptability of these embeddings to different tasks, we design two graph-based prompting methods to incorporate explicit/hidden task knowledge. Extensive experiments on various urban region prediction tasks and different cities demonstrate the superior performance of our GURPP framework. The implementation is available at this repository: https://anonymous.4open.science/r/GURPP.
Abstract:High frame-rate (HFR) videos of action recognition improve fine-grained expression while reducing the spatio-temporal relation and motion information density. Thus, large amounts of video samples are continuously required for traditional data-driven training. However, samples are not always sufficient in real-world scenarios, promoting few-shot action recognition (FSAR) research. We observe that most recent FSAR works build spatio-temporal relation of video samples via temporal alignment after spatial feature extraction, cutting apart spatial and temporal features within samples. They also capture motion information via narrow perspectives between adjacent frames without considering density, leading to insufficient motion information capturing. Therefore, we propose a novel plug-and-play architecture for FSAR called Spatio-tempOral frAme tuPle enhancer (SOAP) in this paper. The model we designed with such architecture refers to SOAP-Net. Temporal connections between different feature channels and spatio-temporal relation of features are considered instead of simple feature extraction. Comprehensive motion information is also captured, using frame tuples with multiple frames containing more motion information than adjacent frames. Combining frame tuples of diverse frame counts further provides a broader perspective. SOAP-Net achieves new state-of-the-art performance across well-known benchmarks such as SthSthV2, Kinetics, UCF101, and HMDB51. Extensive empirical evaluations underscore the competitiveness, pluggability, generalization, and robustness of SOAP. The code is released at https://github.com/wenbohuang1002/SOAP.
Abstract:Node classification is a substantial problem in graph-based fraud detection. Many existing works adopt Graph Neural Networks (GNNs) to enhance fraud detectors. While promising, currently most GNN-based fraud detectors fail to generalize to the low homophily setting. Besides, label utilization has been proved to be significant factor for node classification problem. But we find they are less effective in fraud detection tasks due to the low homophily in graphs. In this work, we propose GAGA, a novel Group AGgregation enhanced TrAnsformer, to tackle the above challenges. Specifically, the group aggregation provides a portable method to cope with the low homophily issue. Such an aggregation explicitly integrates the label information to generate distinguishable neighborhood information. Along with group aggregation, an attempt towards end-to-end trainable group encoding is proposed which augments the original feature space with the class labels. Meanwhile, we devise two additional learnable encodings to recognize the structural and relational context. Then, we combine the group aggregation and the learnable encodings into a Transformer encoder to capture the semantic information. Experimental results clearly show that GAGA outperforms other competitive graph-based fraud detectors by up to 24.39% on two trending public datasets and a real-world industrial dataset from Anonymous. Even more, the group aggregation is demonstrated to outperform other label utilization methods (e.g., C&S, BoT/UniMP) in the low homophily setting.