Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Wenwen Gong

RecDCL: Dual Contrastive Learning for Recommendation

Jan 28, 2024

Dan Zhang, Yangliao Geng, Wenwen Gong, Zhongang Qi, Zhiyu Chen, Xing Tang, Ying Shan, Yuxiao Dong, Jie Tang

Figure 1 for RecDCL: Dual Contrastive Learning for Recommendation

Figure 2 for RecDCL: Dual Contrastive Learning for Recommendation

Figure 3 for RecDCL: Dual Contrastive Learning for Recommendation

Figure 4 for RecDCL: Dual Contrastive Learning for Recommendation

Abstract:Self-supervised recommendation (SSR) has achieved great success in mining the potential interacted behaviors for collaborative filtering in recent years. As a major branch, Contrastive Learning (CL) based SSR conquers data sparsity in Web platforms by contrasting the embedding between raw data and augmented data. However, existing CL-based SSR methods mostly focus on contrasting in a batch-wise way, failing to exploit potential regularity in the feature-wise dimension, leading to redundant solutions during the representation learning process of users (items) from Websites. Furthermore, the joint benefits of utilizing both Batch-wise CL (BCL) and Feature-wise CL (FCL) for recommendations remain underexplored. To address these issues, we investigate the relationship of objectives between BCL and FCL. Our study suggests a cooperative benefit of employing both methods, as evidenced from theoretical and experimental perspectives. Based on these insights, we propose a dual CL method for recommendation, referred to as RecDCL. RecDCL first eliminates redundant solutions on user-item positive pairs in a feature-wise manner. It then optimizes the uniform distributions within users and items using a polynomial kernel from an FCL perspective. Finally, it generates contrastive embedding on output vectors in a batch-wise objective. We conduct experiments on four widely-used benchmarks and an industrial dataset. The results consistently demonstrate that the proposed RecDCL outperforms the state-of-the-art GNNs-based and SSL-based models (with up to a 5.65\% improvement in terms of Recall@20), thereby confirming the effectiveness of the joint-wise objective. All source codes used in this paper are publicly available at \url{https://github.com/THUDM/RecDCL}}.

* Proceedings of TheWebConf 2024 (WWW '24), May 13--17, 2024, Singapore
* Accepted to WWW 2024

Via

Access Paper or Ask Questions

GKD: A General Knowledge Distillation Framework for Large-scale Pre-trained Language Model

Jun 11, 2023

Shicheng Tan, Weng Lam Tam, Yuanchun Wang, Wenwen Gong, Yang Yang, Hongyin Tang, Keqing He, Jiahao Liu, Jingang Wang, Shu Zhao(+2 more)

Figure 1 for GKD: A General Knowledge Distillation Framework for Large-scale Pre-trained Language Model

Figure 2 for GKD: A General Knowledge Distillation Framework for Large-scale Pre-trained Language Model

Figure 3 for GKD: A General Knowledge Distillation Framework for Large-scale Pre-trained Language Model

Figure 4 for GKD: A General Knowledge Distillation Framework for Large-scale Pre-trained Language Model

Abstract:Currently, the reduction in the parameter scale of large-scale pre-trained language models (PLMs) through knowledge distillation has greatly facilitated their widespread deployment on various devices. However, the deployment of knowledge distillation systems faces great challenges in real-world industrial-strength applications, which require the use of complex distillation methods on even larger-scale PLMs (over 10B), limited by memory on GPUs and the switching of methods. To overcome these challenges, we propose GKD, a general knowledge distillation framework that supports distillation on larger-scale PLMs using various distillation methods. With GKD, developers can build larger distillation models on memory-limited GPUs and easily switch and combine different distillation methods within a single framework. Experimental results show that GKD can support the distillation of at least 100B-scale PLMs and 25 mainstream methods on 8 NVIDIA A100 (40GB) GPUs.

* accepted for ACL 2023 industry track

Via

Access Paper or Ask Questions

Are Intermediate Layers and Labels Really Necessary? A General Language Model Distillation Method

Jun 11, 2023

Shicheng Tan, Weng Lam Tam, Yuanchun Wang, Wenwen Gong, Shu Zhao, Peng Zhang, Jie Tang

Figure 1 for Are Intermediate Layers and Labels Really Necessary? A General Language Model Distillation Method

Figure 2 for Are Intermediate Layers and Labels Really Necessary? A General Language Model Distillation Method

Figure 3 for Are Intermediate Layers and Labels Really Necessary? A General Language Model Distillation Method

Figure 4 for Are Intermediate Layers and Labels Really Necessary? A General Language Model Distillation Method

Abstract:The large scale of pre-trained language models poses a challenge for their deployment on various devices, with a growing emphasis on methods to compress these models, particularly knowledge distillation. However, current knowledge distillation methods rely on the model's intermediate layer features and the golden labels (also called hard labels), which usually require aligned model architecture and enough labeled data respectively. Moreover, the parameters of vocabulary are usually neglected in existing methods. To address these problems, we propose a general language model distillation (GLMD) method that performs two-stage word prediction distillation and vocabulary compression, which is simple and surprisingly shows extremely strong performance. Specifically, GLMD supports more general application scenarios by eliminating the constraints of dimension and structure between models and the need for labeled datasets through the absence of intermediate layers and golden labels. Meanwhile, based on the long-tailed distribution of word frequencies in the data, GLMD designs a strategy of vocabulary compression through decreasing vocabulary size instead of dimensionality. Experimental results show that our method outperforms 25 state-of-the-art methods on the SuperGLUE benchmark, achieving an average score that surpasses the best method by 3%.

* Accepted to Findings of ACL2023

Via

Access Paper or Ask Questions