Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:SITransformer: Shared Information-Guided Transformer for Extreme Multimodal Summarization

Aug 29, 2024

Sicheng Liu, Lintao Wang, Xiaogan Zhu, Xuequan Lu, Zhiyong Wang, Kun Hu

Figure 1 for SITransformer: Shared Information-Guided Transformer for Extreme Multimodal Summarization

Figure 2 for SITransformer: Shared Information-Guided Transformer for Extreme Multimodal Summarization

Figure 3 for SITransformer: Shared Information-Guided Transformer for Extreme Multimodal Summarization

Figure 4 for SITransformer: Shared Information-Guided Transformer for Extreme Multimodal Summarization

Share this with someone who'll enjoy it:

Abstract:Extreme Multimodal Summarization with Multimodal Output (XMSMO) becomes an attractive summarization approach by integrating various types of information to create extremely concise yet informative summaries for individual modalities. Existing methods overlook the issue that multimodal data often contains more topic irrelevant information, which can mislead the model into producing inaccurate summaries especially for extremely short ones. In this paper, we propose SITransformer, a Shared Information-guided Transformer for extreme multimodal summarization. It has a shared information guided pipeline which involves a cross-modal shared information extractor and a cross-modal interaction module. The extractor formulates semantically shared salient information from different modalities by devising a novel filtering process consisting of a differentiable top-k selector and a shared-information guided gating unit. As a result, the common, salient, and relevant contents across modalities are identified. Next, a transformer with cross-modal attentions is developed for intra- and inter-modality learning with the shared information guidance to produce the extreme summary. Comprehensive experiments demonstrate that SITransformer significantly enhances the summarization quality for both video and text summaries for XMSMO. Our code will be publicly available at https://github.com/SichengLeoLiu/MMAsia24-XMSMO.

* 8 pages, 5 figures, submitted to ACM Multimedia Asia 2024

View paper on

Share this with someone who'll enjoy it:

Title:SITransformer: Shared Information-Guided Transformer for Extreme Multimodal Summarization

Paper and Code