Abstract:The burgeoning e-Commerce sector requires advanced solutions for the detection of transaction fraud. With an increasing risk of financial information theft and account takeovers, deep learning methods have become integral to the embedding of behavior sequence data in fraud detection. However, these methods often struggle to balance modeling capabilities and efficiency and incorporate domain knowledge. To address these issues, we introduce the multitask CNN behavioral Embedding Model for Transaction Fraud Detection. Our contributions include 1) introducing a single-layer CNN design featuring multirange kernels which outperform LSTM and Transformer models in terms of scalability and domain-focused inductive bias, and 2) the integration of positional encoding with CNN to introduce sequence-order signals enhancing overall performance, and 3) implementing multitask learning with randomly assigned label weights, thus removing the need for manual tuning. Testing on real-world data reveals our model's enhanced performance of downstream transaction models and comparable competitiveness with the Transformer Time Series (TST) model.
Abstract:In e-commerce industry, graph neural network methods are the new trends for transaction risk modeling.The power of graph algorithms lie in the capability to catch transaction linking network information, which is very hard to be captured by other algorithms.However, in most existing approaches, transaction or user connections are defined by hard link strategies on shared properties, such as same credit card, same device, same ip address, same shipping address, etc. Those types of strategies will result in sparse linkages by entities with strong identification characteristics (ie. device) and over-linkages by entities that could be widely shared (ie. ip address), making it more difficult to learn useful information from graph. To address aforementioned problems, we present a novel behavioral biometric based method to establish transaction linkings based on user behavioral similarities, then train an unsupervised GNN to extract embedding features for downstream fraud prediction tasks. To our knowledge, this is the first time similarity based soft link has been used in graph embedding applications. To speed up similarity calculation, we apply an in-house GPU based HDBSCAN clustering method to remove highly concentrated and isolated nodes before graph construction. Our experiments show that embedding features learned from similarity based behavioral graph have achieved significant performance increase to the baseline fraud detection model in various business scenarios. In new guest buyer transaction scenario, this segment is a challenge for traditional method, we can make precision increase from 0.82 to 0.86 at the same recall of 0.27, which means we can decrease false positive rate using this method.
Abstract:As one of the most popular machine learning models today, graph neural networks (GNNs) have attracted intense interest recently, and so does their explainability. Users are increasingly interested in a better understanding of GNN models and their outcomes. Unfortunately, today's evaluation frameworks for GNN explainability often rely on synthetic datasets, leading to conclusions of limited scope due to a lack of complexity in the problem instances. As GNN models are deployed to more mission-critical applications, we are in dire need for a common evaluation protocol of explainability methods of GNNs. In this paper, we propose, to our best knowledge, the first systematic evaluation framework for GNN explainability, considering explainability on three different "user needs:" explanation focus, mask nature, and mask transformation. We propose a unique metric that combines the fidelity measures and classify explanations based on their quality of being sufficient or necessary. We scope ourselves to node classification tasks and compare the most representative techniques in the field of input-level explainability for GNNs. For the widely used synthetic benchmarks, surprisingly shallow techniques such as personalized PageRank have the best performance for a minimum computation time. But when the graph structure is more complex and nodes have meaningful features, gradient-based methods, in particular Saliency, are the best according to our evaluation criteria. However, none dominates the others on all evaluation dimensions and there is always a trade-off. We further apply our evaluation protocol in a case study on eBay graphs to reflect the production environment.
Abstract:Detecting fraudulent transactions is an essential component to control risk in e-commerce marketplaces. Apart from rule-based and machine learning filters that are already deployed in production, we want to enable efficient real-time inference with graph neural networks (GNNs), which is useful to catch multihop risk propagation in a transaction graph. However, two challenges arise in the implementation of GNNs in production. First, future information in a dynamic graph should not be considered in message passing to predict the past. Second, the latency of graph query and GNN model inference is usually up to hundreds of milliseconds, which is costly for some critical online services. To tackle these challenges, we propose a Batch and Real-time Inception GrapH Topology (BRIGHT) framework to conduct an end-to-end GNN learning that allows efficient online real-time inference. BRIGHT framework consists of a graph transformation module (Two-Stage Directed Graph) and a corresponding GNN architecture (Lambda Neural Network). The Two-Stage Directed Graph guarantees that the information passed through neighbors is only from the historical payment transactions. It consists of two subgraphs representing historical relationships and real-time links, respectively. The Lambda Neural Network decouples inference into two stages: batch inference of entity embeddings and real-time inference of transaction prediction. Our experiments show that BRIGHT outperforms the baseline models by >2\% in average w.r.t.~precision. Furthermore, BRIGHT is computationally efficient for real-time fraud detection. Regarding end-to-end performance (including neighbor query and inference), BRIGHT can reduce the P99 latency by >75\%. For the inference stage, our speedup is on average 7.8$\times$ compared to the traditional GNN.
Abstract:At online retail platforms, detecting fraudulent accounts and transactions is crucial to improve customer experience, minimize loss, and avoid unauthorized transactions. Despite the variety of different models for deep learning on graphs, few approaches have been proposed for dealing with graphs that are both heterogeneous and dynamic. In this paper, we propose DyHGN (Dynamic Heterogeneous Graph Neural Network) and its variants to capture both temporal and heterogeneous information. We first construct dynamic heterogeneous graphs from registration and transaction data from eBay. Then, we build models with diachronic entity embedding and heterogeneous graph transformer. We also use model explainability techniques to understand the behaviors of DyHGN-* models. Our findings reveal that modelling graph dynamics with heterogeneous inputs need to be conducted with "attention" depending on the data structure, distribution, and computation cost.
Abstract:Transaction checkout fraud detection is an essential risk control components for E-commerce marketplaces. In order to leverage graph networks to decrease fraud rate efficiently and guarantee the information flow passed through neighbors only from the past of the checkouts, we first present a novel Directed Dynamic Snapshot (DDS) linkage design for graph construction and a Lambda Neural Networks (LNN) architecture for effective inference with Graph Neural Networks embeddings. Experiments show that our LNN on DDS graph, outperforms baseline models significantly and is computational efficient for real-time fraud detection.
Abstract:Massive account registration has raised concerns on risk management in e-commerce companies, especially when registration increases rapidly within a short time frame. To monitor these registrations constantly and minimize the potential loss they might incur, detecting massive registration and predicting their riskiness are necessary. In this paper, we propose a Dynamic Heterogeneous Graph Neural Network framework to capture suspicious massive registrations (DHGReg). We first construct a dynamic heterogeneous graph from the registration data, which is composed of a structural subgraph and a temporal subgraph. Then, we design an efficient architecture to predict suspicious/benign accounts. Our proposed model outperforms the baseline models and is computationally efficient in processing a dynamic heterogeneous graph constructed from a real-world dataset. In practice, the DHGReg framework would benefit the detection of suspicious registration behaviors at an early stage.
Abstract:At online retail platforms, it is crucial to actively detect risks of fraudulent transactions to improve our customer experience, minimize loss, and prevent unauthorized chargebacks. Traditional rule-based methods and simple feature-based models are either inefficient or brittle and uninterpretable. The graph structure that exists among the heterogeneous typed entities of the transaction logs is informative and difficult to fake. To utilize the heterogeneous graph relationships and enrich the explainability, we present xFraud, an explainable Fraud transaction prediction system. xFraud is composed of a predictor which learns expressive representations for malicious transaction detection from the heterogeneous transaction graph via a self-attentive heterogeneous graph neural network, and an explainer that generates meaningful and human understandable explanations from graphs to facilitate further process in business unit. In our experiments with xFraud on two real transaction networks with up to ten millions transactions, we are able to achieve an area under a curve (AUC) score that outperforms baseline models and graph embedding methods. In addition, we show how the explainer could benefit the understanding towards model predictions and enhance model trustworthiness for real-world fraud transaction cases.