Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:CommerceMM: Large-Scale Commerce MultiModal Representation Learning with Omni Retrieval

Feb 15, 2022

Licheng Yu, Jun Chen, Animesh Sinha, Mengjiao MJ Wang, Hugo Chen, Tamara L. Berg, Ning Zhang

Figure 1 for CommerceMM: Large-Scale Commerce MultiModal Representation Learning with Omni Retrieval

Figure 2 for CommerceMM: Large-Scale Commerce MultiModal Representation Learning with Omni Retrieval

Figure 3 for CommerceMM: Large-Scale Commerce MultiModal Representation Learning with Omni Retrieval

Figure 4 for CommerceMM: Large-Scale Commerce MultiModal Representation Learning with Omni Retrieval

Share this with someone who'll enjoy it:

Abstract:We introduce CommerceMM - a multimodal model capable of providing a diverse and granular understanding of commerce topics associated to the given piece of content (image, text, image+text), and having the capability to generalize to a wide range of tasks, including Multimodal Categorization, Image-Text Retrieval, Query-to-Product Retrieval, Image-to-Product Retrieval, etc. We follow the pre-training + fine-tuning training regime and present 5 effective pre-training tasks on image-text pairs. To embrace more common and diverse commerce data with text-to-multimodal, image-to-multimodal, and multimodal-to-multimodal mapping, we propose another 9 novel cross-modal and cross-pair retrieval tasks, called Omni-Retrieval pre-training. The pre-training is conducted in an efficient manner with only two forward/backward updates for the combined 14 tasks. Extensive experiments and analysis show the effectiveness of each task. When combining all pre-training tasks, our model achieves state-of-the-art performance on 7 commerce-related downstream tasks after fine-tuning. Additionally, we propose a novel approach of modality randomization to dynamically adjust our model under different efficiency constraints.

* 10 pages, 7 figures. Commerce Multimodal Model towards Real Applications at Facebook

View paper on

Share this with someone who'll enjoy it:

Title:CommerceMM: Large-Scale Commerce MultiModal Representation Learning with Omni Retrieval

Paper and Code