Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:HorNet: Efficient High-Order Spatial Interactions with Recursive Gated Convolutions

Aug 09, 2022

Yongming Rao, Wenliang Zhao, Yansong Tang, Jie Zhou, Ser-Nam Lim, Jiwen Lu

Figure 1 for HorNet: Efficient High-Order Spatial Interactions with Recursive Gated Convolutions

Figure 2 for HorNet: Efficient High-Order Spatial Interactions with Recursive Gated Convolutions

Figure 3 for HorNet: Efficient High-Order Spatial Interactions with Recursive Gated Convolutions

Figure 4 for HorNet: Efficient High-Order Spatial Interactions with Recursive Gated Convolutions

Share this with someone who'll enjoy it:

Abstract:Recent progress in vision Transformers exhibits great success in various tasks driven by the new spatial modeling mechanism based on dot-product self-attention. In this paper, we show that the key ingredients behind the vision Transformers, namely input-adaptive, long-range and high-order spatial interactions, can also be efficiently implemented with a convolution-based framework. We present the Recursive Gated Convolution ($\textit{g}^\textit{n}$Conv) that performs high-order spatial interactions with gated convolutions and recursive designs. The new operation is highly flexible and customizable, which is compatible with various variants of convolution and extends the two-order interactions in self-attention to arbitrary orders without introducing significant extra computation. $\textit{g}^\textit{n}$Conv can serve as a plug-and-play module to improve various vision Transformers and convolution-based models. Based on the operation, we construct a new family of generic vision backbones named HorNet. Extensive experiments on ImageNet classification, COCO object detection and ADE20K semantic segmentation show HorNet outperform Swin Transformers and ConvNeXt by a significant margin with similar overall architecture and training configurations. HorNet also shows favorable scalability to more training data and a larger model size. Apart from the effectiveness in visual encoders, we also show $\textit{g}^\textit{n}$Conv can be applied to task-specific decoders and consistently improve dense prediction performance with less computation. Our results demonstrate that $\textit{g}^\textit{n}$Conv can be a new basic module for visual modeling that effectively combines the merits of both vision Transformers and CNNs. Code is available at https://github.com/raoyongming/HorNet

* project page: https://hornet.ivg-research.xyz

View paper on

OpenReview

Share this with someone who'll enjoy it:

Title:HorNet: Efficient High-Order Spatial Interactions with Recursive Gated Convolutions

Paper and Code