Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Rethinking Vision Transformers for MobileNet Size and Speed

Dec 15, 2022

Yanyu Li, Ju Hu, Yang Wen, Georgios Evangelidis, Kamyar Salahi, Yanzhi Wang, Sergey Tulyakov, Jian Ren

Figure 1 for Rethinking Vision Transformers for MobileNet Size and Speed

Figure 2 for Rethinking Vision Transformers for MobileNet Size and Speed

Figure 3 for Rethinking Vision Transformers for MobileNet Size and Speed

Figure 4 for Rethinking Vision Transformers for MobileNet Size and Speed

Share this with someone who'll enjoy it:

Abstract:With the success of Vision Transformers (ViTs) in computer vision tasks, recent arts try to optimize the performance and complexity of ViTs to enable efficient deployment on mobile devices. Multiple approaches are proposed to accelerate attention mechanism, improve inefficient designs, or incorporate mobile-friendly lightweight convolutions to form hybrid architectures. However, ViT and its variants still have higher latency or considerably more parameters than lightweight CNNs, even true for the years-old MobileNet. In practice, latency and size are both crucial for efficient deployment on resource-constraint hardware. In this work, we investigate a central question, can transformer models run as fast as MobileNet and maintain a similar size? We revisit the design choices of ViTs and propose an improved supernet with low latency and high parameter efficiency. We further introduce a fine-grained joint search strategy that can find efficient architectures by optimizing latency and number of parameters simultaneously. The proposed models, EfficientFormerV2, achieve about $4\%$ higher top-1 accuracy than MobileNetV2 and MobileNetV2$\times1.4$ on ImageNet-1K with similar latency and parameters. We demonstrate that properly designed and optimized vision transformers can achieve high performance with MobileNet-level size and speed.

* Code is available at: https://github.com/snap-research/EfficientFormer

View paper on

Share this with someone who'll enjoy it:

Title:Rethinking Vision Transformers for MobileNet Size and Speed

Paper and Code