Abstract:Click-Through Rate (CTR) prediction plays a vital role in recommender systems, online advertising, and search engines. Most of the current approaches model feature interactions through stacked or parallel structures, with some employing knowledge distillation for model compression. However, we observe some limitations with these approaches: (1) In parallel structure models, the explicit and implicit components are executed independently and simultaneously, which leads to insufficient information sharing within the feature set. (2) The introduction of knowledge distillation technology brings about the problems of complex teacher-student framework design and low knowledge transfer efficiency. (3) The dataset and the process of constructing high-order feature interactions contain significant noise, which limits the model's effectiveness. To address these limitations, we propose FSDNet, a CTR prediction framework incorporating a plug-and-play fusion self-distillation module. Specifically, FSDNet forms connections between explicit and implicit feature interactions at each layer, enhancing the sharing of information between different features. The deepest fusion layer is then used as the teacher model, utilizing self-distillation to guide the training of shallow layers. Empirical evaluation across four benchmark datasets validates the framework's efficacy and generalization capabilities. The code is available on https://anonymous.4open.science/r/FSDNet.
Abstract:Data surveillance has become more covert and pervasive with AI algorithms, which can result in biased social classifications. Appearance offers intuitive identity signals, but what does it mean to let AI observe and speculate on them? We introduce AI-rays, an interactive installation where AI generates speculative identities from participants' appearance which are expressed through synthesized personal items placed in participants' bags. It uses speculative X-ray visions to contrast reality with AI-generated assumptions, metaphorically highlighting AI's scrutiny and biases. AI-rays promotes discussions on modern surveillance and the future of human-machine reality through a playful, immersive experience exploring AI biases.
Abstract:Large language models (LLMs) have seen rapid improvement in the recent years, and are used in a wider range of applications. After being trained on large text corpus, LLMs obtain the capability of extracting rich features from textual data. Such capability is potentially useful for the web service recommendation task, where the web users and services have intrinsic attributes that can be described using natural language sentences and are useful for recommendation. In this paper, we explore the possibility and practicality of using LLMs for web service recommendation. We propose the large language model aided QoS prediction (llmQoS) model, which use LLMs to extract useful information from attributes of web users and services via descriptive sentences. This information is then used in combination with the QoS values of historical interactions of users and services, to predict QoS values for any given user-service pair. Our proposed model is shown to overcome the data sparsity issue for QoS prediction. We show that on the WSDream dataset, llmQoS outperforms comparable baseline models consistently.
Abstract:Self-supervised learning (SSL) has recently attracted significant attention in the field of recommender systems. Contrastive learning (CL) stands out as a major SSL paradigm due to its robust ability to generate self-supervised signals. Mainstream graph contrastive learning (GCL)-based methods typically implement CL by creating contrastive views through various data augmentation techniques. Despite these methods are effective, we argue that there still exist several challenges: i) Data augmentation (e.g., discarding edges or adding noise) necessitates additional graph convolution (GCN) or modeling operations, which are highly time-consuming and potentially harm the embedding quality. ii) Existing CL-based methods use traditional CL objectives to capture self-supervised signals. However, few studies have explored obtaining CL objectives from more perspectives and have attempted to fuse the varying signals from these CL objectives to enhance recommendation performance. To overcome these challenges, we propose a High-Order Fusion Graph Contrastive Learning (HFGCL) framework for recommendation. Specifically, we discards the data augmentations and instead high-order information from GCN process to create contrastive views. Additionally, to integrate self-supervised signals from various CL objectives, we propose an advanced CL objective. By ensuring that positive pairs are distanced from negative samples derived from both contrastive views, we effectively fuse self-supervised signals from distinct CL objectives, thereby enhancing the mutual information between positive pairs. Experimental results on three public datasets demonstrate the superior effectiveness of HFGCL compared to the state-of-the-art baselines.
Abstract:Contrastive Learning (CL)-based recommender systems have gained prominence in the context of Heterogeneous Graph (HG) due to their capacity to enhance the consistency of representations across different views. However, existing frameworks often neglect the fact that user-item interactions within HG are governed by diverse latent intents (e.g., brand preferences or demographic characteristics of item audiences), which are pivotal in capturing fine-grained relations. The exploration of these underlying intents, particularly through the lens of meta-paths in HGs, presents us with two principal challenges: i) How to integrate CL with intents; ii) How to mitigate noise from meta-path-driven intents. To address these challenges, we propose an innovative framework termed Intent-guided Heterogeneous Graph Contrastive Learning (IHGCL), which designed to enhance CL-based recommendation by capturing the intents contained within meta-paths. Specifically, the IHGCL framework includes: i) a meta-path-based Dual Contrastive Learning (DCL) approach to effectively integrate intents into the recommendation, constructing intent-intent contrast and intent-interaction contrast; ii) a Bottlenecked AutoEncoder (BAE) that combines mask propagation with the information bottleneck principle to significantly reduce noise perturbations introduced by meta-paths. Empirical evaluations conducted across six distinct datasets demonstrate the superior performance of our IHGCL framework relative to conventional baseline methods. Our model implementation is available at https://github.com/wangyu0627/IHGCL.
Abstract:Deep & Cross Network and its derivative models have become an important paradigm in click-through rate (CTR) prediction due to their effective balance between computational cost and performance. However, these models face four major limitations: (1) while most models claim to capture high-order feature interactions, they often do so implicitly and non-interpretably through deep neural networks (DNN), which limits the trustworthiness of the model's predictions; (2) the performance of existing explicit feature interaction methods is often weaker than that of implicit DNN, undermining their necessity; (3) many models fail to adaptively filter noise while enhancing the order of feature interactions; (4) the fusion methods of most models cannot provide suitable supervision signals for their different interaction methods. To address the identified limitations, this paper proposes the next generation Deep Cross Network (DCNv3) and Shallow & Deep Cross Network (SDCNv3). These models ensure interpretability in feature interaction modeling while exponentially increasing the order of feature interactions to achieve genuine Deep Crossing rather than just Deep & Cross. Additionally, we employ a Self-Mask operation to filter noise and reduce the number of parameters in the cross network by half. In the fusion layer, we use a simple yet effective loss weight calculation method called Tri-BCE to provide appropriate supervision signals. Comprehensive experiments on six datasets demonstrate the effectiveness, efficiency, and interpretability of DCNv3 and SDCNv3. The code, running logs, and detailed hyperparameter configurations are available at: https://anonymous.4open.science/r/DCNv3-E352.
Abstract:Medical image synthesis remains challenging due to misalignment noise during training. Existing methods have attempted to address this challenge by incorporating a registration-guided module. However, these methods tend to overlook the task-specific constraints on the synthetic and registration modules, which may cause the synthetic module to still generate spatially aligned images with misaligned target images during training, regardless of the registration module's function. Therefore, this paper proposes registration-guided consistency and incorporates disentanglement learning for medical image synthesis. The proposed registration-guided consistency architecture fosters task-specificity within the synthetic and registration modules by applying identical deformation fields before and after synthesis, while enforcing output consistency through an alignment loss. Moreover, the synthetic module is designed to possess the capability of disentangling anatomical structures and specific styles across various modalities. An anatomy consistency loss is introduced to further compel the synthetic module to preserve geometrical integrity within latent spaces. Experiments conducted on both an in-house abdominal CECT-CT dataset and a publicly available pelvic MR-CT dataset have demonstrated the superiority of the proposed method.
Abstract:Intent modeling has attracted widespread attention in recommender systems. As the core motivation behind user selection of items, intent is crucial for elucidating recommendation results. The current mainstream modeling method is to abstract the intent into unknowable but learnable shared or non-shared parameters. Despite considerable progress, we argue that it still confronts the following challenges: firstly, these methods only capture the coarse-grained aspects of intent, ignoring the fact that user-item interactions will be affected by collective and individual factors (e.g., a user may choose a movie because of its high box office or because of his own unique preferences); secondly, modeling believable intent is severely hampered by implicit feedback, which is incredibly sparse and devoid of true semantics. To address these challenges, we propose a novel recommendation framework designated as Bilateral Intent-guided Graph Collaborative Filtering (BIGCF). Specifically, we take a closer look at user-item interactions from a causal perspective and put forth the concepts of individual intent-which signifies private preferences-and collective intent-which denotes overall awareness. To counter the sparsity of implicit feedback, the feature distributions of users and items are encoded via a Gaussian-based graph generation strategy, and we implement the recommendation process through bilateral intent-guided graph reconstruction re-sampling. Finally, we propose graph contrastive regularization for both interaction and intent spaces to uniformize users, items, intents, and interactions in a self-supervised and non-augmented paradigm. Experimental results on three real-world datasets demonstrate the effectiveness of BIGCF compared with existing solutions.
Abstract:Social recommendation leverages social network to complement user-item interaction data for recommendation task, aiming to mitigate the data sparsity issue in recommender systems. However, existing social recommendation methods encounter the following challenge: both social network and interaction data contain substaintial noise, and the propagation of such noise through Graph Neural Networks (GNNs) not only fails to enhance recommendation performance but may also interfere with the model's normal training. Despite the importance of denoising for social network and interaction data, only a limited number of studies have considered the denoising for social network and all of them overlook that for interaction data, hindering the denoising effect and recommendation performance. Based on this, we propose a novel model called Dual-domain Collaborative Denoising for Social Recommendation ($\textbf{DCDSR}$). DCDSR comprises two primary modules: the structure-level collaborative denoising module and the embedding-space collaborative denoising module. In the structure-level collaborative denoising module, information from interaction domain is first employed to guide social network denoising. Subsequently, the denoised social network is used to supervise the denoising for interaction data. The embedding-space collaborative denoising module devotes to resisting the noise cross-domain diffusion problem through contrastive learning with dual-domain embedding collaborative perturbation. Additionally, a novel contrastive learning strategy, named Anchor-InfoNCE, is introduced to better harness the denoising capability of contrastive learning. Evaluating our model on three real-world datasets verifies that DCDSR has a considerable denoising effect, thus outperforms the state-of-the-art social recommendation methods.
Abstract:Effective feature interaction modeling is critical for enhancing the accuracy of click-through rate (CTR) prediction in industrial recommender systems. Most of the current deep CTR models resort to building complex network architectures to better capture intricate feature interactions or user behaviors. However, we identify two limitations in these models: (1) the samples given to the model are undifferentiated, which may lead the model to learn a larger number of easy samples in a single-minded manner while ignoring a smaller number of hard samples, thus reducing the model's generalization ability; (2) differentiated feature interaction encoders are designed to capture different interactions information but receive consistent supervision signals, thereby limiting the effectiveness of the encoder. To bridge the identified gaps, this paper introduces a novel CTR prediction framework by integrating the plug-and-play Twin Focus (TF) Loss, Sample Selection Embedding Module (SSEM), and Dynamic Fusion Module (DFM), named the Twin Focus Framework for CTR (TF4CTR). Specifically, the framework employs the SSEM at the bottom of the model to differentiate between samples, thereby assigning a more suitable encoder for each sample. Meanwhile, the TF Loss provides tailored supervision signals to both simple and complex encoders. Moreover, the DFM dynamically fuses the feature interaction information captured by the encoders, resulting in more accurate predictions. Experiments on five real-world datasets confirm the effectiveness and compatibility of the framework, demonstrating its capacity to enhance various representative baselines in a model-agnostic manner. To facilitate reproducible research, our open-sourced code and detailed running logs will be made available at: https://github.com/salmon1802/TF4CTR.