Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Xiaoyang Zheng

Delving into E-Commerce Product Retrieval with Vision-Language Pre-training

Apr 17, 2023

Xiaoyang Zheng, Fuyu Lv, Zilong Wang, Qingwen Liu, Xiaoyi Zeng

Figure 1 for Delving into E-Commerce Product Retrieval with Vision-Language Pre-training

Figure 2 for Delving into E-Commerce Product Retrieval with Vision-Language Pre-training

Figure 3 for Delving into E-Commerce Product Retrieval with Vision-Language Pre-training

Figure 4 for Delving into E-Commerce Product Retrieval with Vision-Language Pre-training

Abstract:E-commerce search engines comprise a retrieval phase and a ranking phase, where the first one returns a candidate product set given user queries. Recently, vision-language pre-training, combining textual information with visual clues, has been popular in the application of retrieval tasks. In this paper, we propose a novel V+L pre-training method to solve the retrieval problem in Taobao Search. We design a visual pre-training task based on contrastive learning, outperforming common regression-based visual pre-training tasks. In addition, we adopt two negative sampling schemes, tailored for the large-scale retrieval task. Besides, we introduce the details of the online deployment of our proposed method in real-world situations. Extensive offline/online experiments demonstrate the superior performance of our method on the retrieval task. Our proposed method is employed as one retrieval channel of Taobao Search and serves hundreds of millions of users in real time.

* 5 pages, 4 figures, accepted to SIRIP 2023

Via

Access Paper or Ask Questions

MAKE: Product Retrieval with Vision-Language Pre-training in Taobao Search

Jan 30, 2023

Xiaoyang Zheng, Zilong Wang, Sen Li, Ke Xu, Tao Zhuang, Qingwen Liu, Xiaoyi Zeng

Abstract:Taobao Search consists of two phases: the retrieval phase and the ranking phase. Given a user query, the retrieval phase returns a subset of candidate products for the following ranking phase. Recently, the paradigm of pre-training and fine-tuning has shown its potential in incorporating visual clues into retrieval tasks. In this paper, we focus on solving the problem of text-to-multimodal retrieval in Taobao Search. We consider that users' attention on titles or images varies on products. Hence, we propose a novel Modal Adaptation module for cross-modal fusion, which helps assigns appropriate weights on texts and images across products. Furthermore, in e-commerce search, user queries tend to be brief and thus lead to significant semantic imbalance between user queries and product titles. Therefore, we design a separate text encoder and a Keyword Enhancement mechanism to enrich the query representations and improve text-to-multimodal matching. To this end, we present a novel vision-language (V+L) pre-training methods to exploit the multimodal information of (user query, product title, product image). Extensive experiments demonstrate that our retrieval-specific pre-training model (referred to as MAKE) outperforms existing V+L pre-training methods on the text-to-multimodal retrieval task. MAKE has been deployed online and brings major improvements on the retrieval system of Taobao Search.

* 5 pages, accepted to The Industry Track of the Web Conference 2023

Via

Access Paper or Ask Questions

Weakly-Supervised Saliency Detection via Salient Object Subitizing

Jan 04, 2021

Xiaoyang Zheng, Xin Tan, Jie Zhou, Lizhuang Ma, Rynson W. H. Lau

Figure 1 for Weakly-Supervised Saliency Detection via Salient Object Subitizing

Figure 2 for Weakly-Supervised Saliency Detection via Salient Object Subitizing

Figure 3 for Weakly-Supervised Saliency Detection via Salient Object Subitizing

Figure 4 for Weakly-Supervised Saliency Detection via Salient Object Subitizing

Abstract:Salient object detection aims at detecting the most visually distinct objects and producing the corresponding masks. As the cost of pixel-level annotations is high, image tags are usually used as weak supervisions. However, an image tag can only be used to annotate one class of objects. In this paper, we introduce saliency subitizing as the weak supervision since it is class-agnostic. This allows the supervision to be aligned with the property of saliency detection, where the salient objects of an image could be from more than one class. To this end, we propose a model with two modules, Saliency Subitizing Module (SSM) and Saliency Updating Module (SUM). While SSM learns to generate the initial saliency masks using the subitizing information, without the need for any unsupervised methods or some random seeds, SUM helps iteratively refine the generated saliency masks. We conduct extensive experiments on five benchmark datasets. The experimental results show that our method outperforms other weakly-supervised methods and even performs comparably to some fully-supervised methods.

* This paper is accepted to IEEE Trans. on Circuits and Systems for Video Technology (TCSVT)

Via

Access Paper or Ask Questions