Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:ASR-enhanced Multimodal Representation Learning for Cross-Domain Product Retrieval

Aug 06, 2024

Ruixiang Zhao, Jian Jia, Yan Li, Xuehan Bai, Quan Chen, Han Li, Peng Jiang, Xirong Li

Figure 1 for ASR-enhanced Multimodal Representation Learning for Cross-Domain Product Retrieval

Figure 2 for ASR-enhanced Multimodal Representation Learning for Cross-Domain Product Retrieval

Figure 3 for ASR-enhanced Multimodal Representation Learning for Cross-Domain Product Retrieval

Figure 4 for ASR-enhanced Multimodal Representation Learning for Cross-Domain Product Retrieval

Share this with someone who'll enjoy it:

Abstract:E-commerce is increasingly multimedia-enriched, with products exhibited in a broad-domain manner as images, short videos, or live stream promotions. A unified and vectorized cross-domain production representation is essential. Due to large intra-product variance and high inter-product similarity in the broad-domain scenario, a visual-only representation is inadequate. While Automatic Speech Recognition (ASR) text derived from the short or live-stream videos is readily accessible, how to de-noise the excessively noisy text for multimodal representation learning is mostly untouched. We propose ASR-enhanced Multimodal Product Representation Learning (AMPere). In order to extract product-specific information from the raw ASR text, AMPere uses an easy-to-implement LLM-based ASR text summarizer. The LLM-summarized text, together with visual data, is then fed into a multi-branch network to generate compact multimodal embeddings. Extensive experiments on a large-scale tri-domain dataset verify the effectiveness of AMPere in obtaining a unified multimodal product representation that clearly improves cross-domain product retrieval.

* 10 pages, 5 figures

View paper on

Share this with someone who'll enjoy it:

Title:ASR-enhanced Multimodal Representation Learning for Cross-Domain Product Retrieval

Paper and Code