Tong Guo

Query-dominant User Interest Network for Large-Scale Search Ranking

Oct 10, 2023

Tong Guo, Xuanping Li, Haitao Yang, Xiao Liang, Yong Yuan, Jingyou Hou, Bingqing Ke, Chao Zhang, junlin He, Shunyu Zhang(+2 more)

Figure 1 for Query-dominant User Interest Network for Large-Scale Search Ranking

Figure 2 for Query-dominant User Interest Network for Large-Scale Search Ranking

Figure 3 for Query-dominant User Interest Network for Large-Scale Search Ranking

Figure 4 for Query-dominant User Interest Network for Large-Scale Search Ranking

Abstract:Historical behaviors have shown great effect and potential in various prediction tasks, including recommendation and information retrieval. The overall historical behaviors are various but noisy while search behaviors are always sparse. Most existing approaches in personalized search ranking adopt the sparse search behaviors to learn representation with bottleneck, which do not sufficiently exploit the crucial long-term interest. In fact, there is no doubt that user long-term interest is various but noisy for instant search, and how to exploit it well still remains an open problem. To tackle this problem, in this work, we propose a novel model named Query-dominant user Interest Network (QIN), including two cascade units to filter the raw user behaviors and reweigh the behavior subsequences. Specifically, we propose a relevance search unit (RSU), which aims to search a subsequence relevant to the query first and then search the sub-subsequences relevant to the target item. These items are then fed into an attention unit called Fused Attention Unit (FAU). It should be able to calculate attention scores from the ID field and attribute field separately, and then adaptively fuse the item embedding and content embedding based on the user engagement of past period. Extensive experiments and ablation studies on real-world datasets demonstrate the superiority of our model over state-of-the-art methods. The QIN now has been successfully deployed on Kuaishou search, an online video search platform, and obtained 7.6% improvement on CTR.

* 10 pages

Via

The Re-Label Method For Data-Centric Machine Learning

Feb 09, 2023

Abstract:In industry deep learning application, our manually labeled data has a certain number of noisy data. To solve this problem and achieve more than 90 score in dev dataset, we present a simple method to find the noisy data and re-label the noisy data by human, given the model predictions as references in human labeling. In this paper, we illustrate our idea for a broad set of deep learning tasks, includes classification, sequence tagging, object detection, sequence generation, click-through rate prediction. The experimental results and human evaluation results verify our idea.

Via

A Comprehensive Exploration of Pre-training Language Models

Jun 22, 2021

Figure 1 for A Comprehensive Exploration of Pre-training Language Models

Abstract:Recently, the development of pre-trained language models has brought natural language processing (NLP) tasks to the new state-of-the-art. In this paper we explore the efficiency of various pre-trained language models. We pre-train a list of transformer-based models with the same amount of text and the same training steps. The experimental results shows that the most improvement upon the origin BERT is adding the RNN-layer to capture more contextual information for the transformer-encoder layers.

* working in progress

Via

Learning From How Human Correct

Jan 30, 2021

Figure 1 for Learning From How Human Correct

Figure 2 for Learning From How Human Correct

Figure 3 for Learning From How Human Correct

Figure 4 for Learning From How Human Correct

Abstract:In industry NLP application, our manually labeled data has a certain number of noisy data. We present a simple method to find the noisy data and relabel them manually, meanwhile we collect the correction information. Then we present novel method to incorporate the human correction information into deep learning model. Human know how to correct noisy data. So the correction information can be inject into deep learning model. We do the experiment on our own text classification dataset, which is manually labeled, because we relabel the noisy data in our dataset for our industry application. The experiment result shows that our method improve the classification accuracy from 91.7% to 92.5%. The 91.7% baseline is based on BERT training on the corrected dataset, which is hard to surpass.

Via

Predictions For Pre-training Language Models

Nov 18, 2020

Figure 1 for Predictions For Pre-training Language Models

Figure 2 for Predictions For Pre-training Language Models

Figure 3 for Predictions For Pre-training Language Models

Figure 4 for Predictions For Pre-training Language Models

Abstract:Language model pre-training has proven to be useful in many language understanding tasks. In this paper, we investigate whether it is still helpful to add the specific task's loss in pre-training step. In industry NLP applications, we have large amount of data produced by users. We use the fine-tuned model to give the user-generated unlabeled data a pseudo-label. Then we use the pseudo-label for the task-specific loss and masked language model loss to pre-train. The experiment shows that using the fine-tuned model's predictions for pseudo-labeled pre-training offers further gains in the downstream task. The improvement of our method is stable and remarkable.

* 5 pages

Via

Content Enhanced BERT-based Text-to-SQL Generation

Nov 27, 2019

Figure 1 for Content Enhanced BERT-based Text-to-SQL Generation

Figure 2 for Content Enhanced BERT-based Text-to-SQL Generation

Figure 3 for Content Enhanced BERT-based Text-to-SQL Generation

Abstract:We present a simple methods to leverage the table content for the BERT-based model to solve the text-to-SQL problem. Based on the observation that some of the table content match some words in question string and some of the table header also match some words in question string, we encode two addition feature vector for the deep model. Our methods also benefit the model inference in testing time as the tables are almost the same in training and testing time. We test our model on the WikiSQL dataset and outperform the BERT-based baseline by 3.7% in logic form and 3.7% in execution accuracy and achieve state-of-the-art.

* working in progress

Via

Revisiting Semantic Representation and Tree Search for Similar Question Retrieval

Sep 06, 2019

Figure 1 for Revisiting Semantic Representation and Tree Search for Similar Question Retrieval

Figure 2 for Revisiting Semantic Representation and Tree Search for Similar Question Retrieval

Figure 3 for Revisiting Semantic Representation and Tree Search for Similar Question Retrieval

Figure 4 for Revisiting Semantic Representation and Tree Search for Similar Question Retrieval

Abstract:This paper studies the performances of BERT combined with tree structure in short sentence ranking task. In retrieval-based question answering system, we retrieve the most similar question of the query question by ranking all the questions in datasets. If we want to rank all the sentences by neural rankers, we need to score all the sentence pairs. However it consumes large amount of time. So we design a specific tree for searching and combine deep model to solve this problem. We fine-tune BERT on the training data to get semantic vector or sentence embeddings on the test data. We use all the sentence embeddings of test data to build our tree based on k-means and do beam search at predicting time when given a sentence as query. We do the experiments on the semantic textual similarity dataset, Quora Question Pairs, and process the dataset for sentence ranking. Experimental results show that our methods outperform the strong baseline. Our tree accelerate the predicting speed by 500%-1000% without losing too much ranking accuracy.

Via

Using Database Rule for Weak Supervised Text-to-SQL Generation

Jul 31, 2019

Figure 1 for Using Database Rule for Weak Supervised Text-to-SQL Generation

Figure 2 for Using Database Rule for Weak Supervised Text-to-SQL Generation

Figure 3 for Using Database Rule for Weak Supervised Text-to-SQL Generation

Figure 4 for Using Database Rule for Weak Supervised Text-to-SQL Generation

Abstract:We present a simple way to do the task of text-to-SQL problem with weak supervision. We call it Rule-SQL. Given the question and the answer from the database table without the SQL logic form, Rule-SQL use the rules based on table column names and question string for the SQL exploration first and then use the explored SQL for supervised training. We design several rules for reducing the exploration search space. For the deep model, we leverage BERT for the representation layer and separate the model to SELECT, AGG and WHERE parts. The experiment result on WikiSQL outperforms the strong baseline of full supervision and is comparable to the start-of-the-art weak supervised mothods.

Via

Table2answer: Read the database and answer without SQL

Mar 11, 2019

Figure 1 for Table2answer: Read the database and answer without SQL

Figure 2 for Table2answer: Read the database and answer without SQL

Figure 3 for Table2answer: Read the database and answer without SQL

Figure 4 for Table2answer: Read the database and answer without SQL

Abstract:Semantic parsing is the task of mapping natural language to logic form. In question answering, semantic parsing can be used to map the question to logic form and execute the logic form to get the answer. One key problem for semantic parsing is the hard label work. We study this problem in another way: we do not use the logic form any more. Instead we only use the schema and answer info. We think that the logic form step can be injected into the deep model. The reason why we think removing the logic form step is possible is that human can do the task without explicit logic form. We use BERT-based model and do the experiment in the WikiSQL dataset, which is a large natural language to SQL dataset. Our experimental evaluations that show that our model can achieves the baseline results in WikiSQL dataset.

Via