Abstract:Modeling and analysis for event series generated by heterogeneous users of various behavioral patterns are closely involved in our daily lives, including credit card fraud detection, online platform user recommendation, and social network analysis. The most commonly adopted approach to this task is to classify users into behavior-based categories and analyze each of them separately. However, this approach requires extensive data to fully understand user behavior, presenting challenges in modeling newcomers without historical knowledge. In this paper, we propose a novel discrete event prediction framework for new users through the lens of causal inference. Our method offers an unbiased prediction for new users without needing to know their categories. We treat the user event history as the ''treatment'' for future events and the user category as the key confounder. Thus, the prediction problem can be framed as counterfactual outcome estimation, with the new user model trained on an adjusted dataset where each event is re-weighted by its inverse propensity score. We demonstrate the superior performance of the proposed framework with a numerical simulation study and two real-world applications, including Netflix rating prediction and seller contact prediction for customer support at Amazon.
Abstract:Modeling and estimation for spatial data are ubiquitous in real life, frequently appearing in weather forecasting, pollution detection, and agriculture. Spatial data analysis often involves processing datasets of enormous scale. In this work, we focus on large-scale internet-quality open datasets from Ookla. We look into estimating mobile (cellular) internet quality at the scale of a state in the United States. In particular, we aim to conduct estimation based on highly {\it imbalanced} data: Most of the samples are concentrated in limited areas, while very few are available in the rest, posing significant challenges to modeling efforts. We propose a new adaptive kernel regression approach that employs self-tuning kernels to alleviate the adverse effects of data imbalance in this problem. Through comparative experimentation on two distinct mobile network measurement datasets, we demonstrate that the proposed self-tuning kernel regression method produces more accurate predictions, with the potential to be applied in other applications.
Abstract:Spatio-temporal event data are becoming increasingly available in a wide variety of applications, such as electronic transaction records, social network data, and crime data. How to efficiently detect anomalies in these dynamic systems using these streaming event data? We propose a novel anomaly detection framework for such event data combining the Long short-term memory (LSTM) and marked spatio-temporal point processes. The detection procedure can be computed in an online and distributed fashion via feeding the streaming data through an LSTM and a neural network-based discriminator. We study the false-alarm-rate and detection delay using theory and simulation and show that it can achieve weak signal detection by aggregating local statistics over time and networks. Finally, we demonstrate the good performance using real-world datasets.