Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Lintao Wang

SITransformer: Shared Information-Guided Transformer for Extreme Multimodal Summarization

Aug 29, 2024

Sicheng Liu, Lintao Wang, Xiaogan Zhu, Xuequan Lu, Zhiyong Wang, Kun Hu

Figure 1 for SITransformer: Shared Information-Guided Transformer for Extreme Multimodal Summarization

Figure 2 for SITransformer: Shared Information-Guided Transformer for Extreme Multimodal Summarization

Figure 3 for SITransformer: Shared Information-Guided Transformer for Extreme Multimodal Summarization

Figure 4 for SITransformer: Shared Information-Guided Transformer for Extreme Multimodal Summarization

Abstract:Extreme Multimodal Summarization with Multimodal Output (XMSMO) becomes an attractive summarization approach by integrating various types of information to create extremely concise yet informative summaries for individual modalities. Existing methods overlook the issue that multimodal data often contains more topic irrelevant information, which can mislead the model into producing inaccurate summaries especially for extremely short ones. In this paper, we propose SITransformer, a Shared Information-guided Transformer for extreme multimodal summarization. It has a shared information guided pipeline which involves a cross-modal shared information extractor and a cross-modal interaction module. The extractor formulates semantically shared salient information from different modalities by devising a novel filtering process consisting of a differentiable top-k selector and a shared-information guided gating unit. As a result, the common, salient, and relevant contents across modalities are identified. Next, a transformer with cross-modal attentions is developed for intra- and inter-modality learning with the shared information guidance to produce the extreme summary. Comprehensive experiments demonstrate that SITransformer significantly enhances the summarization quality for both video and text summaries for XMSMO. Our code will be publicly available at https://github.com/SichengLeoLiu/MMAsia24-XMSMO.

* 8 pages, 5 figures, submitted to ACM Multimedia Asia 2024

Via

Access Paper or Ask Questions

Multi-Scale Control Signal-Aware Transformer for Motion Synthesis without Phase

Mar 03, 2023

Lintao Wang, Kun Hu, Lei Bai, Yu Ding, Wanli Ouyang, Zhiyong Wang

Figure 1 for Multi-Scale Control Signal-Aware Transformer for Motion Synthesis without Phase

Figure 2 for Multi-Scale Control Signal-Aware Transformer for Motion Synthesis without Phase

Figure 3 for Multi-Scale Control Signal-Aware Transformer for Motion Synthesis without Phase

Figure 4 for Multi-Scale Control Signal-Aware Transformer for Motion Synthesis without Phase

Abstract:Synthesizing controllable motion for a character using deep learning has been a promising approach due to its potential to learn a compact model without laborious feature engineering. To produce dynamic motion from weak control signals such as desired paths, existing methods often require auxiliary information such as phases for alleviating motion ambiguity, which limits their generalisation capability. As past poses often contain useful auxiliary hints, in this paper, we propose a task-agnostic deep learning method, namely Multi-scale Control Signal-aware Transformer (MCS-T), with an attention based encoder-decoder architecture to discover the auxiliary information implicitly for synthesizing controllable motion without explicitly requiring auxiliary information such as phase. Specifically, an encoder is devised to adaptively formulate the motion patterns of a character's past poses with multi-scale skeletons, and a decoder driven by control signals to further synthesize and predict the character's state by paying context-specialised attention to the encoded past motion patterns. As a result, it helps alleviate the issues of low responsiveness and slow transition which often happen in conventional methods not using auxiliary information. Both qualitative and quantitative experimental results on an existing biped locomotion dataset, which involves diverse types of motion transitions, demonstrate the effectiveness of our method. In particular, MCS-T is able to successfully generate motions comparable to those generated by the methods using auxiliary information.

Via

Access Paper or Ask Questions