Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Zheng-Ning Liu

Visual Attention Network

Mar 08, 2022

Meng-Hao Guo, Cheng-Ze Lu, Zheng-Ning Liu, Ming-Ming Cheng, Shi-Min Hu

Abstract:While originally designed for natural language processing tasks, the self-attention mechanism has recently taken various computer vision areas by storm. However, the 2D nature of images brings three challenges for applying self-attention in computer vision. (1) Treating images as 1D sequences neglects their 2D structures. (2) The quadratic complexity is too expensive for high-resolution images. (3) It only captures spatial adaptability but ignores channel adaptability. In this paper, we propose a novel large kernel attention (LKA) module to enable self-adaptive and long-range correlations in self-attention while avoiding the above issues. We further introduce a novel neural network based on LKA, namely Visual Attention Network (VAN). While extremely simple, VAN outperforms the state-of-the-art vision transformers and convolutional neural networks with a large margin in extensive experiments, including image classification, object detection, semantic segmentation, instance segmentation, etc. Code is available at https://github.com/Visual-Attention-Network.

* Code is available at https://github.com/Visual-Attention-Network

Via

Access Paper or Ask Questions

Attention Mechanisms in Computer Vision: A Survey

Nov 15, 2021

Meng-Hao Guo, Tian-Xing Xu, Jiang-Jiang Liu, Zheng-Ning Liu, Peng-Tao Jiang, Tai-Jiang Mu, Song-Hai Zhang, Ralph R. Martin, Ming-Ming Cheng, Shi-Min Hu

Figure 1 for Attention Mechanisms in Computer Vision: A Survey

Figure 2 for Attention Mechanisms in Computer Vision: A Survey

Figure 3 for Attention Mechanisms in Computer Vision: A Survey

Figure 4 for Attention Mechanisms in Computer Vision: A Survey

Abstract:Humans can naturally and effectively find salient regions in complex scenes. Motivated by this observation, attention mechanisms were introduced into computer vision with the aim of imitating this aspect of the human visual system. Such an attention mechanism can be regarded as a dynamic weight adjustment process based on features of the input image. Attention mechanisms have achieved great success in many visual tasks, including image classification, object detection, semantic segmentation, video understanding, image generation, 3D vision, multi-modal tasks and self-supervised learning. In this survey, we provide a comprehensive review of various attention mechanisms in computer vision and categorize them according to approach, such as channel attention, spatial attention, temporal attention and branch attention; a related repository https://github.com/MenghaoGuo/Awesome-Vision-Attentions is dedicated to collecting related work. We also suggest future directions for attention mechanism research.

* 27 pages, 9 figures

Via

Access Paper or Ask Questions

Subdivision-Based Mesh Convolution Networks

Jun 04, 2021

Shi-Min Hu, Zheng-Ning Liu, Meng-Hao Guo, Jun-Xiong Cai, Jiahui Huang, Tai-Jiang Mu, Ralph R. Martin

Figure 1 for Subdivision-Based Mesh Convolution Networks

Figure 2 for Subdivision-Based Mesh Convolution Networks

Figure 3 for Subdivision-Based Mesh Convolution Networks

Figure 4 for Subdivision-Based Mesh Convolution Networks

Abstract:Convolutional neural networks (CNNs) have made great breakthroughs in 2D computer vision. However, the irregular structure of meshes makes it hard to exploit the power of CNNs directly. A subdivision surface provides a hierarchical multi-resolution structure, and each face in a closed 2-manifold triangle mesh is exactly adjacent to three faces. Motivated by these two properties, this paper introduces a novel and flexible CNN framework, named SubdivNet, for 3D triangle meshes with Loop subdivision sequence connectivity. Making an analogy between mesh faces and pixels in a 2D image allows us to present a mesh convolution operator to aggregate local features from adjacent faces. By exploiting face neighborhoods, this convolution can support standard 2D convolutional network concepts, e.g. variable kernel size, stride, and dilation. Based on the multi-resolution hierarchy, we propose a spatial uniform pooling layer which merges four faces into one and an upsampling method which splits one face into four. As a result, many popular 2D CNN architectures can be readily adapted to processing 3D meshes. Meshes with arbitrary connectivity can be remeshed to hold Loop subdivision sequence connectivity via self-parameterization, making SubdivNet a general approach. Experiments on mesh classification, segmentation, correspondence, and retrieval from the real-world demonstrate the effectiveness and efficiency of SubdivNet.

* Codes are available in https://github.com/lzhengning/SubdivNet

Via

Access Paper or Ask Questions

Can Attention Enable MLPs To Catch Up With CNNs?

May 31, 2021

Meng-Hao Guo, Zheng-Ning Liu, Tai-Jiang Mu, Dun Liang, Ralph R. Martin, Shi-Min Hu

Figure 1 for Can Attention Enable MLPs To Catch Up With CNNs?

Abstract:In the first week of May, 2021, researchers from four different institutions: Google, Tsinghua University, Oxford University and Facebook, shared their latest work [16, 7, 12, 17] on arXiv.org almost at the same time, each proposing new learning architectures, consisting mainly of linear layers, claiming them to be comparable, or even superior to convolutional-based models. This sparked immediate discussion and debate in both academic and industrial communities as to whether MLPs are sufficient, many thinking that learning architectures are returning to MLPs. Is this true? In this perspective, we give a brief history of learning architectures, including multilayer perceptrons (MLPs), convolutional neural networks (CNNs) and transformers. We then examine what the four newly proposed architectures have in common. Finally, we give our views on challenges and directions for new learning architectures, hoping to inspire future research.

* Computational Visual Media, 2021, accepted. 4 pages, 1 figure

Via

Access Paper or Ask Questions

Beyond Self-attention: External Attention using Two Linear Layers for Visual Tasks

May 31, 2021

Meng-Hao Guo, Zheng-Ning Liu, Tai-Jiang Mu, Shi-Min Hu

Figure 1 for Beyond Self-attention: External Attention using Two Linear Layers for Visual Tasks

Figure 2 for Beyond Self-attention: External Attention using Two Linear Layers for Visual Tasks

Figure 3 for Beyond Self-attention: External Attention using Two Linear Layers for Visual Tasks

Figure 4 for Beyond Self-attention: External Attention using Two Linear Layers for Visual Tasks

Abstract:Attention mechanisms, especially self-attention, have played an increasingly important role in deep feature representation for visual tasks. Self-attention updates the feature at each position by computing a weighted sum of features using pair-wise affinities across all positions to capture the long-range dependency within a single sample. However, self-attention has quadratic complexity and ignores potential correlation between different samples. This paper proposes a novel attention mechanism which we call external attention, based on two external, small, learnable, shared memories, which can be implemented easily by simply using two cascaded linear layers and two normalization layers; it conveniently replaces self-attention in existing popular architectures. External attention has linear complexity and implicitly considers the correlations between all data samples. We further incorporate the multi-head mechanism into external attention to provide an all-MLP architecture, external attention MLP (EAMLP), for image classification. Extensive experiments on image classification, object detection, semantic segmentation, instance segmentation, image generation, and point cloud analysis reveal that our method provides results comparable or superior to the self-attention mechanism and some of its variants, with much lower computational and memory costs.

* 11 pages, 6 figures. external attention and EAMLP

Via

Access Paper or Ask Questions

PCT: Point Cloud Transformer

Dec 17, 2020

Meng-Hao Guo, Jun-Xiong Cai, Zheng-Ning Liu, Tai-Jiang Mu, Ralph R. Martin, Shi-Min Hu

Figure 1 for PCT: Point Cloud Transformer

Figure 2 for PCT: Point Cloud Transformer

Figure 3 for PCT: Point Cloud Transformer

Figure 4 for PCT: Point Cloud Transformer

Abstract:The irregular domain and lack of ordering make it challenging to design deep neural networks for point cloud processing. This paper presents a novel framework named Point Cloud Transformer(PCT) for point cloud learning. PCT is based on Transformer, which achieves huge success in natural language processing and displays great potential in image processing. It is inherently permutation invariant for processing a sequence of points, making it well-suited for point cloud learning. To better capture local context within the point cloud, we enhance input embedding with the support of farthest point sampling and nearest neighbor search. Extensive experiments demonstrate that the PCT achieves the state-of-the-art performance on shape classification, part segmentation and normal estimation tasks.

* 10 pages, 5 figures

Via

Access Paper or Ask Questions