Alan
Abstract:The problem of forecasting spatiotemporal events such as crimes and accidents is crucial to public safety and city management. Besides accuracy, interpretability is also a key requirement for spatiotemporal forecasting models to justify the decisions. Interpretation of the spatiotemporal forecasting mechanism is, however, challenging due to the complexity of multi-source spatiotemporal features, the non-intuitive nature of spatiotemporal patterns for non-expert users, and the presence of spatial heterogeneity in the data. Currently, no existing deep learning model intrinsically interprets the complex predictive process learned from multi-source spatiotemporal features. To bridge the gap, we propose GeoPro-Net, an intrinsically interpretable spatiotemporal model for spatiotemporal event forecasting problems. GeoPro-Net introduces a novel Geo-concept convolution operation, which employs statistical tests to extract predictive patterns in the input as Geo-concepts, and condenses the Geo-concept-encoded input through interpretable channel fusion and geographic-based pooling. In addition, GeoPro-Net learns different sets of prototypes of concepts inherently, and projects them to real-world cases for interpretation. Comprehensive experiments and case studies on four real-world datasets demonstrate that GeoPro-Net provides better interpretability while still achieving competitive prediction performance compared with state-of-the-art baselines.
Abstract:As pre-trained models, like Transformers, are increasingly deployed on cloud platforms for inference services, the privacy concerns surrounding model parameters and inference data are becoming more acute. Current Privacy-Preserving Transformer Inference (PPTI) frameworks struggle with the "impossible trinity" of privacy, efficiency, and performance. For instance, Secure Multi-Party Computation (SMPC)-based solutions offer strong privacy guarantees but come with significant inference overhead and performance trade-offs. On the other hand, PPTI frameworks that use random permutations achieve inference efficiency close to that of plaintext and maintain accurate results but require exposing some model parameters and intermediate results, thereby risking substantial privacy breaches. Addressing this "impossible trinity" with a single technique proves challenging. To overcome this challenge, we propose Centaur, a novel hybrid PPTI framework. Unlike existing methods, Centaur protects model parameters with random permutations and inference data with SMPC, leveraging the structure of Transformer models. By designing a series of efficient privacy-preserving algorithms, Centaur leverages the strengths of both techniques to achieve a better balance between privacy, efficiency, and performance in PPTI. We comprehensively evaluate the effectiveness of Centaur on various types of Transformer models and datasets. Experimental results demonstrate that the privacy protection capabilities offered by Centaur can withstand various existing model inversion attack methods. In terms of performance and efficiency, Centaur not only maintains the same performance as plaintext inference but also improves inference speed by $5.0-30.4$ times.
Abstract:In recent years, multimodal benchmarks for general domains have guided the rapid development of multimodal models on general tasks. However, the financial field has its peculiarities. It features unique graphical images (e.g., candlestick charts, technical indicator charts) and possesses a wealth of specialized financial knowledge (e.g., futures, turnover rate). Therefore, benchmarks from general fields often fail to measure the performance of multimodal models in the financial domain, and thus cannot effectively guide the rapid development of large financial models. To promote the development of large financial multimodal models, we propose MME-Finance, an bilingual open-ended and practical usage-oriented Visual Question Answering (VQA) benchmark. The characteristics of our benchmark are finance and expertise, which include constructing charts that reflect the actual usage needs of users (e.g., computer screenshots and mobile photography), creating questions according to the preferences in financial domain inquiries, and annotating questions by experts with 10+ years of experience in the financial industry. Additionally, we have developed a custom-designed financial evaluation system in which visual information is first introduced in the multi-modal evaluation process. Extensive experimental evaluations of 19 mainstream MLLMs are conducted to test their perception, reasoning, and cognition capabilities. The results indicate that models performing well on general benchmarks cannot do well on MME-Finance; for instance, the top-performing open-source and closed-source models obtain 65.69 (Qwen2VL-72B) and 63.18 (GPT-4o), respectively. Their performance is particularly poor in categories most relevant to finance, such as candlestick charts and technical indicator charts. In addition, we propose a Chinese version, which helps compare performance of MLLMs under a Chinese context.
Abstract:Time series forecasting is extensively applied across diverse domains. Transformer-based models demonstrate significant potential in modeling cross-time and cross-variable interaction. However, we notice that the cross-variable correlation of multivariate time series demonstrates multifaceted (positive and negative correlations) and dynamic progression over time, which is not well captured by existing Transformer-based models. To address this issue, we propose a TimeCNN model to refine cross-variable interactions to enhance time series forecasting. Its key innovation is timepoint-independent, where each time point has an independent convolution kernel, allowing each time point to have its independent model to capture relationships among variables. This approach effectively handles both positive and negative correlations and adapts to the evolving nature of variable relationships over time. Extensive experiments conducted on 12 real-world datasets demonstrate that TimeCNN consistently outperforms state-of-the-art models. Notably, our model achieves significant reductions in computational requirements (approximately 60.46%) and parameter count (about 57.50%), while delivering inference speeds 3 to 4 times faster than the benchmark iTransformer model
Abstract:Machine learning techniques have garnered great interest in designing communication systems owing to their capacity in tacking with channel uncertainty. To provide theoretical guarantees for learning-based communication systems, some recent works analyze generalization bounds for devised methods based on the assumption of Independently and Identically Distributed (I.I.D.) channels, a condition rarely met in practical scenarios. In this paper, we drop the I.I.D. channel assumption and study an online optimization problem of learning to communicate over time-correlated channels. To address this issue, we further focus on two specific tasks: optimizing channel decoders for time-correlated fading channels and selecting optimal codebooks for time-correlated additive noise channels. For utilizing temporal dependence of considered channels to better learn communication systems, we develop two online optimization algorithms based on the optimistic online mirror descent framework. Furthermore, we provide theoretical guarantees for proposed algorithms via deriving sub-linear regret bound on the expected error probability of learned systems. Extensive simulation experiments have been conducted to validate that our presented approaches can leverage the channel correlation to achieve a lower average symbol error rate compared to baseline methods, consistent with our theoretical findings.
Abstract:Recent advancements in the collection and analysis of sequential educational data have brought time series analysis to a pivotal position in educational research, highlighting its essential role in facilitating data-driven decision-making. However, there is a lack of comprehensive summaries that consolidate these advancements. To the best of our knowledge, this paper is the first to provide a comprehensive review of time series analysis techniques specifically within the educational context. We begin by exploring the landscape of educational data analytics, categorizing various data sources and types relevant to education. We then review four prominent time series methods-forecasting, classification, clustering, and anomaly detection-illustrating their specific application points in educational settings. Subsequently, we present a range of educational scenarios and applications, focusing on how these methods are employed to address diverse educational tasks, which highlights the practical integration of multiple time series methods to solve complex educational problems. Finally, we conclude with a discussion on future directions, including personalized learning analytics, multimodal data fusion, and the role of large language models (LLMs) in educational time series. The contributions of this paper include a detailed taxonomy of educational data, a synthesis of time series techniques with specific educational applications, and a forward-looking perspective on emerging trends and future research opportunities in educational analysis. The related papers and resources are available and regularly updated at the project page.
Abstract:Process mining, as a high-level field in data mining, plays a crucial role in enhancing operational efficiency and decision-making across organizations. In this survey paper, we delve into the growing significance and ongoing trends in the field of process mining, advocating a specific viewpoint on its contents, application, and development in modern businesses and process management, particularly in cross-organizational settings. We first summarize the framework of process mining, common industrial applications, and the latest advances combined with artificial intelligence, such as workflow optimization, compliance checking, and performance analysis. Then, we propose a holistic framework for intelligent process analysis and outline initial methodologies in cross-organizational settings, highlighting both challenges and opportunities. This particular perspective aims to revolutionize process mining by leveraging artificial intelligence to offer sophisticated solutions for complex, multi-organizational data analysis. By integrating advanced machine learning techniques, we can enhance predictive capabilities, streamline processes, and facilitate real-time decision-making. Furthermore, we pinpoint avenues for future investigations within the research community, encouraging the exploration of innovative algorithms, data integration strategies, and privacy-preserving methods to fully harness the potential of process mining in diverse, interconnected business environments.
Abstract:The multiplayer online battle arena (MOBA) genre has gained significant popularity and economic success, attracting considerable research interest within the Human-Computer Interaction community. Enhancing the gaming experience requires a deep understanding of player behavior, and a crucial aspect of MOBA games is matchmaking, which aims to assemble teams of comparable skill levels. However, existing matchmaking systems often neglect important factors such as players' position preferences and team assignment, resulting in imbalanced matches and reduced player satisfaction. To address these limitations, this paper proposes a novel framework called CUPID, which introduces a novel process called ``re-matchmaking'' to optimize team and position assignments to improve both fairness and player satisfaction. CUPID incorporates a pre-filtering step to ensure a minimum level of matchmaking quality, followed by a pre-match win-rate prediction model that evaluates the fairness of potential assignments. By simultaneously considering players' position satisfaction and game fairness, CUPID aims to provide an enhanced matchmaking experience. Extensive experiments were conducted on two large-scale, real-world MOBA datasets to validate the effectiveness of CUPID. The results surpass all existing state-of-the-art baselines, with an average relative improvement of 7.18% in terms of win prediction accuracy. Furthermore, CUPID has been successfully deployed in a popular online mobile MOBA game. The deployment resulted in significant improvements in match fairness and player satisfaction, as evidenced by critical Human-Computer Interaction (HCI) metrics covering usability, accessibility, and engagement, observed through A/B testing. To the best of our knowledge, CUPID is the first re-matchmaking system designed specifically for large-scale MOBA games.
Abstract:Product attribute value extraction involves identifying the specific values associated with various attributes from a product profile. While existing methods often prioritize the development of effective models to improve extraction performance, there has been limited emphasis on extraction efficiency. However, in real-world scenarios, products are typically associated with multiple attributes, necessitating multiple extractions to obtain all corresponding values. In this work, we propose an Efficient product Attribute Value Extraction (EAVE) approach via lightweight sparse-layer interaction. Specifically, we employ a heavy encoder to separately encode the product context and attribute. The resulting non-interacting heavy representations of the context can be cached and reused for all attributes. Additionally, we introduce a light encoder to jointly encode the context and the attribute, facilitating lightweight interactions between them. To enrich the interaction within the lightweight encoder, we design a sparse-layer interaction module to fuse the non-interacting heavy representation into the lightweight encoder. Comprehensive evaluation on two benchmarks demonstrate that our method achieves significant efficiency gains with neutral or marginal loss in performance when the context is long and number of attributes is large. Our code is available \href{https://anonymous.4open.science/r/EAVE-EA18}{here}.
Abstract:Graph Neural Networks (GNNs) have gained significant attention as a powerful modeling and inference method, especially for homophilic graph-structured data. To empower GNNs in heterophilic graphs, where adjacent nodes exhibit dissimilar labels or features, Signed Message Passing (SMP) has been widely adopted. However, there is a lack of theoretical and empirical analysis regarding the limitations of SMP. In this work, we unveil some potential pitfalls of SMP and their remedies. We first identify two limitations of SMP: undesirable representation update for multi-hop neighbors and vulnerability against oversmoothing issues. To overcome these challenges, we propose a novel message passing function called Multiset to Multiset GNN(M2M-GNN). Our theoretical analyses and extensive experiments demonstrate that M2M-GNN effectively alleviates the aforementioned limitations of SMP, yielding superior performance in comparison