Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Which Transformer to Favor: A Comparative Analysis of Efficiency in Vision Transformers

Aug 18, 2023

Tobias Christian Nauen, Sebastian Palacio, Andreas Dengel

Figure 1 for Which Transformer to Favor: A Comparative Analysis of Efficiency in Vision Transformers

Figure 2 for Which Transformer to Favor: A Comparative Analysis of Efficiency in Vision Transformers

Figure 3 for Which Transformer to Favor: A Comparative Analysis of Efficiency in Vision Transformers

Figure 4 for Which Transformer to Favor: A Comparative Analysis of Efficiency in Vision Transformers

Share this with someone who'll enjoy it:

Abstract:The growing popularity of Vision Transformers as the go-to models for image classification has led to an explosion of architectural modifications claiming to be more efficient than the original ViT. However, a wide diversity of experimental conditions prevents a fair comparison between all of them, based solely on their reported results. To address this gap in comparability, we conduct a comprehensive analysis of more than 30 models to evaluate the efficiency of vision transformers and related architectures, considering various performance metrics. Our benchmark provides a comparable baseline across the landscape of efficiency-oriented transformers, unveiling a plethora of surprising insights. For example, we discover that ViT is still Pareto optimal across multiple efficiency metrics, despite the existence of several alternative approaches claiming to be more efficient. Results also indicate that hybrid attention-CNN models fare particularly well when it comes to low inference memory and number of parameters, and also that it is better to scale the model size, than the image size. Furthermore, we uncover a strong positive correlation between the number of FLOPS and the training memory, which enables the estimation of required VRAM from theoretical measurements alone. Thanks to our holistic evaluation, this study offers valuable insights for practitioners and researchers, facilitating informed decisions when selecting models for specific applications. We publicly release our code and data at https://github.com/tobna/WhatTransformerToFavor

View paper on

Share this with someone who'll enjoy it:

Title:Which Transformer to Favor: A Comparative Analysis of Efficiency in Vision Transformers

Paper and Code