Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

JiaMing Lei

Synthesizing Tensor Transformations for Visual Self-attention

Jan 05, 2022

Xian Wei, Xihao Wang, Hai Lan, JiaMing Lei, Yanhui Huang, Hui Yu, Jian Yang

Figure 1 for Synthesizing Tensor Transformations for Visual Self-attention

Figure 2 for Synthesizing Tensor Transformations for Visual Self-attention

Figure 3 for Synthesizing Tensor Transformations for Visual Self-attention

Figure 4 for Synthesizing Tensor Transformations for Visual Self-attention

Abstract:Self-attention shows outstanding competence in capturing long-range relationships while enhancing performance on vision tasks, such as image classification and image captioning. However, the self-attention module highly relies on the dot product multiplication and dimension alignment among query-key-value features, which cause two problems: (1) The dot product multiplication results in exhaustive and redundant computation. (2) Due to the visual feature map often appearing as a multi-dimensional tensor, reshaping the scale of the tensor feature to adapt to the dimension alignment might destroy the internal structure of the tensor feature map. To address these problems, this paper proposes a self-attention plug-in module with its variants, namely, Synthesizing Tensor Transformations (STT), for directly processing image tensor features. Without computing the dot-product multiplication among query-key-value, the basic STT is composed of the tensor transformation to learn the synthetic attention weight from visual information. The effectiveness of STT series is validated on the image classification and image caption. Experiments show that the proposed STT achieves competitive performance while keeping robustness compared to self-attention based above vision tasks.

* 13 pages,3 figures

Via

Access Paper or Ask Questions