Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Zhong Ming

VisNumBench: Evaluating Number Sense of Multimodal Large Language Models

Mar 19, 2025

Tengjin Weng, Jingyi Wang, Wenhao Jiang, Zhong Ming

Abstract:Can Multimodal Large Language Models (MLLMs) develop an intuitive number sense similar to humans? Targeting this problem, we introduce Visual Number Benchmark (VisNumBench) to evaluate the number sense abilities of MLLMs across a wide range of visual numerical tasks. VisNumBench consists of about 1,900 multiple-choice question-answer pairs derived from both synthetic and real-world visual data, covering seven visual numerical attributes and four types of visual numerical estimation tasks. Our experiments on VisNumBench led to the following key findings: (i) The 17 MLLMs we tested, including open-source models such as Qwen2.5-VL and InternVL2.5, as well as proprietary models like GPT-4o and Gemini 2.0 Flash, perform significantly below human levels in number sense-related tasks. (ii) Multimodal mathematical models and multimodal chain-of-thought (CoT) models did not exhibit significant improvements in number sense abilities. (iii) Stronger MLLMs with larger parameter sizes and broader general abilities demonstrate modest gains in number sense abilities. We believe VisNumBench will serve as a valuable resource for the research community, encouraging further advancements in enhancing MLLMs' number sense abilities. All benchmark resources, including code and datasets, will be publicly available at https://wwwtttjjj.github.io/VisNumBench/.

Via

Access Paper or Ask Questions

A Survey on Sequential Recommendation

Dec 17, 2024

Liwei Pan, Weike Pan, Meiyan Wei, Hongzhi Yin, Zhong Ming

Abstract:Different from most conventional recommendation problems, sequential recommendation focuses on learning users' preferences by exploiting the internal order and dependency among the interacted items, which has received significant attention from both researchers and practitioners. In recent years, we have witnessed great progress and achievements in this field, necessitating a new survey. In this survey, we study the SR problem from a new perspective (i.e., the construction of an item's properties), and summarize the most recent techniques used in sequential recommendation such as pure ID-based SR, SR with side information, multi-modal SR, generative SR, LLM-powered SR, ultra-long SR and data-augmented SR. Moreover, we introduce some frontier research topics in sequential recommendation, e.g., open-domain SR, data-centric SR, could-edge collaborative SR, continuous SR, SR for good, and explainable SR. We believe that our survey could be served as a valuable roadmap for readers in this field.

Via

Access Paper or Ask Questions

Lossless and Privacy-Preserving Graph Convolution Network for Federated Item Recommendation

Dec 02, 2024

Guowei Wu, Weike Pan, Qiang Yang, Zhong Ming

Figure 1 for Lossless and Privacy-Preserving Graph Convolution Network for Federated Item Recommendation

Figure 2 for Lossless and Privacy-Preserving Graph Convolution Network for Federated Item Recommendation

Figure 3 for Lossless and Privacy-Preserving Graph Convolution Network for Federated Item Recommendation

Figure 4 for Lossless and Privacy-Preserving Graph Convolution Network for Federated Item Recommendation

Abstract:Graph neural network (GNN) has emerged as a state-of-the-art solution for item recommendation. However, existing GNN-based recommendation methods rely on a centralized storage of fragmented user-item interaction sub-graphs and training on an aggregated global graph, which will lead to privacy concerns. As a response, some recent works develop GNN-based federated recommendation methods by exploiting decentralized and fragmented user-item sub-graphs in order to preserve user privacy. However, due to privacy constraints, the graph convolution process in existing federated recommendation methods is incomplete compared with the centralized counterpart, causing a degradation of the recommendation performance. In this paper, we propose a novel lossless and privacy-preserving graph convolution network (LP-GCN), which fully completes the graph convolution process with decentralized user-item interaction sub-graphs while ensuring privacy. It is worth mentioning that its performance is equivalent to that of the non-federated (i.e., centralized) counterpart. Moreover, we validate its effectiveness through both theoretical analysis and empirical studies. Extensive experiments on three real-world datasets show that our LP-GCN outperforms the existing federated recommendation methods. The code will be publicly available once the paper is accepted.

Via

Access Paper or Ask Questions

Sample Enrichment via Temporary Operations on Subsequences for Sequential Recommendation

Jul 25, 2024

Shu Chen, Jinwei Luo, Weike Pan, Jiangxing Yu, Xin Huang, Zhong Ming

Figure 1 for Sample Enrichment via Temporary Operations on Subsequences for Sequential Recommendation

Figure 2 for Sample Enrichment via Temporary Operations on Subsequences for Sequential Recommendation

Figure 3 for Sample Enrichment via Temporary Operations on Subsequences for Sequential Recommendation

Figure 4 for Sample Enrichment via Temporary Operations on Subsequences for Sequential Recommendation

Abstract:Sequential recommendation leverages interaction sequences to predict forthcoming user behaviors, crucial for crafting personalized recommendations. However, the true preferences of a user are inherently complex and high-dimensional, while the observed data is merely a simplified and low-dimensional projection of the rich preferences, which often leads to prevalent issues like data sparsity and inaccurate model training. To learn true preferences from the sparse data, most existing works endeavor to introduce some extra information or design some ingenious models. Although they have shown to be effective, extra information usually increases the cost of data collection, and complex models may result in difficulty in deployment. Innovatively, we avoid the use of extra information or alterations to the model; instead, we fill the transformation space between the observed data and the underlying preferences with randomness. Specifically, we propose a novel model-agnostic and highly generic framework for sequential recommendation called sample enrichment via temporary operations on subsequences (SETO), which temporarily and separately enriches the transformation space via sequence enhancement operations with rationality constraints in training. The transformation space not only exists in the process from input samples to preferences but also in preferences to target samples. We highlight our SETO's effectiveness and versatility over multiple representative and state-of-the-art sequential recommendation models (including six single-domain sequential models and two cross-domain sequential models) across multiple real-world datasets (including three single-domain datasets, three cross-domain datasets and a large-scale industry dataset).

* 12 pages, 6 figures

Via

Access Paper or Ask Questions

A Practice-Friendly Two-Stage LLM-Enhanced Paradigm in Sequential Recommendation

Jun 01, 2024

Dugang Liu, Shenxian Xian, Xiaolin Lin, Xiaolian Zhang, Hong Zhu, Yuan Fang, Zhen Chen, Zhong Ming

Abstract:The training paradigm integrating large language models (LLM) is gradually reshaping sequential recommender systems (SRS) and has shown promising results. However, most existing LLM-enhanced methods rely on rich textual information on the item side and instance-level supervised fine-tuning (SFT) to inject collaborative information into LLM, which is inefficient and limited in many applications. To alleviate these problems, this paper proposes a novel practice-friendly two-stage LLM-enhanced paradigm (TSLRec) for SRS. Specifically, in the information reconstruction stage, we design a new user-level SFT task for collaborative information injection with the assistance of a pre-trained SRS model, which is more efficient and compatible with limited text information. We aim to let LLM try to infer the latent category of each item and reconstruct the corresponding user's preference distribution for all categories from the user's interaction sequence. In the information augmentation stage, we feed each item into LLM to obtain a set of enhanced embeddings that combine collaborative information and LLM inference capabilities. These embeddings can then be used to help train various future SRS models. Finally, we verify the effectiveness and efficiency of our TSLRec on three SRS benchmark datasets.

Via

Access Paper or Ask Questions

Benchmarking for Deep Uplift Modeling in Online Marketing

Jun 01, 2024

Dugang Liu, Xing Tang, Yang Qiao, Miao Liu, Zexu Sun, Xiuqiang He, Zhong Ming

Figure 1 for Benchmarking for Deep Uplift Modeling in Online Marketing

Figure 2 for Benchmarking for Deep Uplift Modeling in Online Marketing

Figure 3 for Benchmarking for Deep Uplift Modeling in Online Marketing

Figure 4 for Benchmarking for Deep Uplift Modeling in Online Marketing

Abstract:Online marketing is critical for many industrial platforms and business applications, aiming to increase user engagement and platform revenue by identifying corresponding delivery-sensitive groups for specific incentives, such as coupons and bonuses. As the scale and complexity of features in industrial scenarios increase, deep uplift modeling (DUM) as a promising technique has attracted increased research from academia and industry, resulting in various predictive models. However, current DUM still lacks some standardized benchmarks and unified evaluation protocols, which limit the reproducibility of experimental results in existing studies and the practical value and potential impact in this direction. In this paper, we provide an open benchmark for DUM and present comparison results of existing models in a reproducible and uniform manner. To this end, we conduct extensive experiments on two representative industrial datasets with different preprocessing settings to re-evaluate 13 existing models. Surprisingly, our experimental results show that the most recent work differs less than expected from traditional work in many cases. In addition, our experiments also reveal the limitations of DUM in generalization, especially for different preprocessing and test distributions. Our benchmarking work allows researchers to evaluate the performance of new models quickly but also reasonably demonstrates fair comparison results with existing models. It also gives practitioners valuable insights into often overlooked considerations when deploying DUM. We will make this benchmarking library, evaluation protocol, and experimental setup available on GitHub.

Via

Access Paper or Ask Questions

BMLP: Behavior-aware MLP for Heterogeneous Sequential Recommendation

Feb 20, 2024

Weixin Li, Yuhao Wu, Yang Liu, Weike Pan, Zhong Ming

Abstract:In real recommendation scenarios, users often have different types of behaviors, such as clicking and buying. Existing research methods show that it is possible to capture the heterogeneous interests of users through different types of behaviors. However, most multi-behavior approaches have limitations in learning the relationship between different behaviors. In this paper, we propose a novel multilayer perceptron (MLP)-based heterogeneous sequential recommendation method, namely behavior-aware multilayer perceptron (BMLP). Specifically, it has two main modules, including a heterogeneous interest perception (HIP) module, which models behaviors at multiple granularities through behavior types and transition relationships, and a purchase intent perception (PIP) module, which adaptively fuses subsequences of auxiliary behaviors to capture users' purchase intent. Compared with mainstream sequence models, MLP is competitive in terms of accuracy and has unique advantages in simplicity and efficiency. Extensive experiments show that BMLP achieves significant improvement over state-of-the-art algorithms on four public datasets. In addition, its pure MLP architecture leads to a linear time complexity.

Via

Access Paper or Ask Questions

Privacy-Preserving Cross-Domain Sequential Recommendation

Jan 27, 2024

Zhaohao Lin, Weike Pan, Zhong Ming

Abstract:Cross-domain sequential recommendation is an important development direction of recommender systems. It combines the characteristics of sequential recommender systems and cross-domain recommender systems, which can capture the dynamic preferences of users and alleviate the problem of cold-start users. However, in recent years, people pay more and more attention to their privacy. They do not want other people to know what they just bought, what videos they just watched, and where they just came from. How to protect the users' privacy has become an urgent problem to be solved. In this paper, we propose a novel privacy-preserving cross-domain sequential recommender system (PriCDSR), which can provide users with recommendation services while preserving their privacy at the same time. Specifically, we define a new differential privacy on the data, taking into account both the ID information and the order information. Then, we design a random mechanism that satisfies this differential privacy and provide its theoretical proof. Our PriCDSR is a non-invasive method that can adopt any cross-domain sequential recommender system as a base model without any modification to it. To the best of our knowledge, our PriCDSR is the first work to investigate privacy issues in cross-domain sequential recommender systems. We conduct experiments on three domains, and the results demonstrate that our PriCDSR, despite introducing noise, still outperforms recommender systems that only use data from a single domain.

Via

Access Paper or Ask Questions

A Survey on Cross-Domain Sequential Recommendation

Jan 19, 2024

Shu Chen, Zitao Xu, Weike Pan, Qiang Yang, Zhong Ming

Abstract:Cross-domain sequential recommendation (CDSR) shifts the modeling of user preferences from flat to stereoscopic by integrating and learning interaction information from multiple domains at different granularities (ranging from inter-sequence to intra-sequence and from single-domain to cross-domain). In this survey, we first define the CDSR problem using a four-dimensional tensor and then analyze its multi-type input representations under multidirectional dimensionality reductions. Following that, we provide a systematic overview from both macro and micro views. From a macro view, we abstract the multi-level fusion structures of various models across domains and discuss their bridges for fusion. From a micro view, focusing on the existing models, we specifically discuss the basic technologies and then explain the auxiliary learning technologies. Finally, we exhibit the available public datasets and the representative experimental results as well as provide some insights into future directions for research in CDSR.

Via

Access Paper or Ask Questions

A Survey on Multi-Behavior Sequential Recommendation

Aug 30, 2023

Xiaoqing Chen, Zhitao Li, Weike Pan, Zhong Ming

Abstract:Recommender systems is set up to address the issue of information overload in traditional information retrieval systems, which is focused on recommending information that is of most interest to users from massive information. Generally, there is a sequential nature and heterogeneity to the behavior of a person interacting with a system, leading to the proposal of multi-behavior sequential recommendation (MBSR). MBSR is a relatively new and worthy direction for in-depth research, which can achieve state-of-the-art recommendation through suitable modeling, and some related works have been proposed. This survey aims to shed light on the MBSR problem. Firstly, we introduce MBSR in detail, including its problem definition, application scenarios and challenges faced. Secondly, we detail the classification of MBSR, including neighborhood-based methods, matrix factorization-based methods and deep learning-based methods, where we further classify the deep learning-based methods into different learning architectures based on RNN, GNN, Transformer, and generic architectures as well as architectures that integrate hybrid techniques. In each method, we present related works based on the data perspective and the modeling perspective, as well as analyze the strengths, weaknesses and features of these works. Finally, we discuss some promising future research directions to address the challenges and improve the current status of MBSR.

Via

Access Paper or Ask Questions