Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:HyenaPixel: Global Image Context with Convolutions

Feb 29, 2024

Julian Spravil, Sebastian Houben, Sven Behnke

Figure 1 for HyenaPixel: Global Image Context with Convolutions

Figure 2 for HyenaPixel: Global Image Context with Convolutions

Figure 3 for HyenaPixel: Global Image Context with Convolutions

Figure 4 for HyenaPixel: Global Image Context with Convolutions

Share this with someone who'll enjoy it:

Abstract:In vision tasks, a larger effective receptive field (ERF) is associated with better performance. While attention natively supports global context, convolution requires multiple stacked layers and a hierarchical structure for large context. In this work, we extend Hyena, a convolution-based attention replacement, from causal sequences to the non-causal two-dimensional image space. We scale the Hyena convolution kernels beyond the feature map size up to 191$\times$191 to maximize the ERF while maintaining sub-quadratic complexity in the number of pixels. We integrate our two-dimensional Hyena, HyenaPixel, and bidirectional Hyena into the MetaFormer framework. For image categorization, HyenaPixel and bidirectional Hyena achieve a competitive ImageNet-1k top-1 accuracy of 83.0% and 83.5%, respectively, while outperforming other large-kernel networks. Combining HyenaPixel with attention further increases accuracy to 83.6%. We attribute the success of attention to the lack of spatial bias in later stages and support this finding with bidirectional Hyena.

View paper on

Share this with someone who'll enjoy it:

Title:HyenaPixel: Global Image Context with Convolutions

Paper and Code