Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Yunsheng Shi

ERNIE-Search: Bridging Cross-Encoder with Dual-Encoder via Self On-the-fly Distillation for Dense Passage Retrieval

May 18, 2022

Yuxiang Lu, Yiding Liu, Jiaxiang Liu, Yunsheng Shi, Zhengjie Huang, Shikun Feng Yu Sun, Hao Tian, Hua Wu, Shuaiqiang Wang, Dawei Yin(+1 more)

Figure 1 for ERNIE-Search: Bridging Cross-Encoder with Dual-Encoder via Self On-the-fly Distillation for Dense Passage Retrieval

Figure 2 for ERNIE-Search: Bridging Cross-Encoder with Dual-Encoder via Self On-the-fly Distillation for Dense Passage Retrieval

Figure 3 for ERNIE-Search: Bridging Cross-Encoder with Dual-Encoder via Self On-the-fly Distillation for Dense Passage Retrieval

Figure 4 for ERNIE-Search: Bridging Cross-Encoder with Dual-Encoder via Self On-the-fly Distillation for Dense Passage Retrieval

Abstract:Neural retrievers based on pre-trained language models (PLMs), such as dual-encoders, have achieved promising performance on the task of open-domain question answering (QA). Their effectiveness can further reach new state-of-the-arts by incorporating cross-architecture knowledge distillation. However, most of the existing studies just directly apply conventional distillation methods. They fail to consider the particular situation where the teacher and student have different structures. In this paper, we propose a novel distillation method that significantly advances cross-architecture distillation for dual-encoders. Our method 1) introduces a self on-the-fly distillation method that can effectively distill late interaction (i.e., ColBERT) to vanilla dual-encoder, and 2) incorporates a cascade distillation process to further improve the performance with a cross-encoder teacher. Extensive experiments are conducted to validate that our proposed solution outperforms strong baselines and establish a new state-of-the-art on open-domain QA benchmarks.

Via

Access Paper or Ask Questions

ERNIE-GeoL: A Geography-and-Language Pre-trained Model and its Applications in Baidu Maps

Apr 06, 2022

Jizhou Huang, Haifeng Wang, Yibo Sun, Yunsheng Shi, Zhengjie Huang, An Zhuo, Shikun Feng

Figure 1 for ERNIE-GeoL: A Geography-and-Language Pre-trained Model and its Applications in Baidu Maps

Figure 2 for ERNIE-GeoL: A Geography-and-Language Pre-trained Model and its Applications in Baidu Maps

Figure 3 for ERNIE-GeoL: A Geography-and-Language Pre-trained Model and its Applications in Baidu Maps

Figure 4 for ERNIE-GeoL: A Geography-and-Language Pre-trained Model and its Applications in Baidu Maps

Abstract:Pre-trained models (PTMs) have become a fundamental backbone for downstream tasks in natural language processing and computer vision. Despite initial gains that were obtained by applying generic PTMs to geo-related tasks at Baidu Maps, a clear performance plateau over time was observed. One of the main reasons for this plateau is the lack of readily available geographic knowledge in generic PTMs. To address this problem, in this paper, we present ERNIE-GeoL, which is a geography-and-language pre-trained model designed and developed for improving the geo-related tasks at Baidu Maps. ERNIE-GeoL is elaborately designed to learn a universal representation of geography-language by pre-training on large-scale data generated from a heterogeneous graph that contains abundant geographic knowledge. Extensive quantitative and qualitative experiments conducted on large-scale real-world datasets demonstrate the superiority and effectiveness of ERNIE-GeoL. ERNIE-GeoL has already been deployed in production at Baidu Maps since April 2021, which significantly benefits the performance of a wide range of downstream tasks. This demonstrates that ERNIE-GeoL can serve as a fundamental backbone for geo-related tasks.

* Submitted to KDD 2022 ADS Track

Via

Access Paper or Ask Questions

NOTE: Solution for KDD-CUP 2021 WikiKG90M-LSC

Jul 05, 2021

Weiyue Su, Zeyang Fang, Hui Zhong, Huijuan Wang, Siming Dai, Zhengjie Huang, Yunsheng Shi, Shikun Feng, Zeyu Chen

Figure 1 for NOTE: Solution for KDD-CUP 2021 WikiKG90M-LSC

Figure 2 for NOTE: Solution for KDD-CUP 2021 WikiKG90M-LSC

Figure 3 for NOTE: Solution for KDD-CUP 2021 WikiKG90M-LSC

Abstract:WikiKG90M in KDD Cup 2021 is a large encyclopedic knowledge graph, which could benefit various downstream applications such as question answering and recommender systems. Participants are invited to complete the knowledge graph by predicting missing triplets. Recent representation learning methods have achieved great success on standard datasets like FB15k-237. Thus, we train the advanced algorithms in different domains to learn the triplets, including OTE, QuatE, RotatE and TransE. Significantly, we modified OTE into NOTE (short for Norm-OTE) for better performance. Besides, we use both the DeepWalk and the post-smoothing technique to capture the graph structure for supplementation. In addition to the representations, we also use various statistical probabilities among the head entities, the relations and the tail entities for the final prediction. Experimental results show that the ensemble of state-of-the-art representation learning methods could draw on each others strengths. And we develop feature engineering from validation candidates for further improvements. Please note that we apply the same strategy on the test set for final inference. And these features may not be practical in the real world when considering ranking against all the entities.

* The 1st solution for KDD-CUP 2021 WIKIKG90M-LSC. 7 pages, 2 figures, 1 table

Via

Access Paper or Ask Questions

Masked Label Prediction: Unified Massage Passing Model for Semi-Supervised Classification

Sep 09, 2020

Yunsheng Shi, Zhengjie Huang, Wenjin Wang, Hui Zhong, Shikun Feng, Yu Sun

Figure 1 for Masked Label Prediction: Unified Massage Passing Model for Semi-Supervised Classification

Figure 2 for Masked Label Prediction: Unified Massage Passing Model for Semi-Supervised Classification

Figure 3 for Masked Label Prediction: Unified Massage Passing Model for Semi-Supervised Classification

Figure 4 for Masked Label Prediction: Unified Massage Passing Model for Semi-Supervised Classification

Abstract:Graph convolutional network (GCN) and label propagation algorithms (LPA) are both message passing algorithms, which have achieved superior performance in semi-supervised classification. But GCN performs feature propagation by a neural network to make predictions, while LPA uses label propagation across graph adjacency matrix to get results. However, there is still no good way to combine these two kinds of algorithms. In this paper, we proposed a new Unified Massage Passaging model (UniMP) that can incorporate feature propagation and label propagation with a shared message passing network, providing a better performance in semi-supervised classification. First, we adopt a graph Transformer network jointly label embedding to propagate both the feature and label information. Second, to train UniMP without overfitting in self-loop label information, we propose a masked label prediction method, in which some per-entage of training examples are simply masked at random, and then predicted. UniMP conceptually unifies feature propagation and label propagation and be empirically powerful. It obtains new state-of-the-art semi-supervised classification results in Open Graph Benchmark (OGB). Our implementation is available online https://github.com/PaddlePaddle/PGL/tree/main/ ogb_examples/nodeproppred/unimp.

Via

Access Paper or Ask Questions