Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Jen Hong Tan

How Lightweight Can A Vision Transformer Be

Jul 25, 2024

Jen Hong Tan

Abstract:In this paper, we explore a strategy that uses Mixture-of-Experts (MoE) to streamline, rather than augment, vision transformers. Each expert in an MoE layer is a SwiGLU feedforward network, where V and W2 are shared across the layer. No complex attention or convolutional mechanisms are employed. Depth-wise scaling is applied to progressively reduce the size of the hidden layer and the number of experts is increased in stages. Grouped query attention is used. We studied the proposed approach with and without pre-training on small datasets and investigated whether transfer learning works at this scale. We found that the architecture is competitive even at a size of 0.67M parameters.

Via

Access Paper or Ask Questions

Pre-training of Lightweight Vision Transformers on Small Datasets with Minimally Scaled Images

Feb 06, 2024

Jen Hong Tan

Abstract:Can a lightweight Vision Transformer (ViT) match or exceed the performance of Convolutional Neural Networks (CNNs) like ResNet on small datasets with small image resolutions? This report demonstrates that a pure ViT can indeed achieve superior performance through pre-training, using a masked auto-encoder technique with minimal image scaling. Our experiments on the CIFAR-10 and CIFAR-100 datasets involved ViT models with fewer than 3.65 million parameters and a multiply-accumulate (MAC) count below 0.27G, qualifying them as 'lightweight' models. Unlike previous approaches, our method attains state-of-the-art performance among similar lightweight transformer-based architectures without significantly scaling up images from CIFAR-10 and CIFAR-100. This achievement underscores the efficiency of our model, not only in handling small datasets but also in effectively processing images close to their original scale.

* 7 pages, 6 figures

Via

Access Paper or Ask Questions

Segmentation of optic disc, fovea and retinal vasculature using a single convolutional neural network

Feb 02, 2017

Jen Hong Tan, U. Rajendra Acharya, Sulatha V. Bhandary, Kuang Chua Chua, Sobha Sivaprasad

Figure 1 for Segmentation of optic disc, fovea and retinal vasculature using a single convolutional neural network

Figure 2 for Segmentation of optic disc, fovea and retinal vasculature using a single convolutional neural network

Figure 3 for Segmentation of optic disc, fovea and retinal vasculature using a single convolutional neural network

Figure 4 for Segmentation of optic disc, fovea and retinal vasculature using a single convolutional neural network

Abstract:We have developed and trained a convolutional neural network to automatically and simultaneously segment optic disc, fovea and blood vessels. Fundus images were normalised before segmentation was performed to enforce consistency in background lighting and contrast. For every effective point in the fundus image, our algorithm extracted three channels of input from the neighbourhood of the point and forward the response across the 7 layer network. In average, our segmentation achieved an accuracy of 92.68 percent on the testing set from Drive database.

Via

Access Paper or Ask Questions

Active spline model: A shape based model-interactive segmentation

Feb 26, 2014

Jen Hong Tan, U. Rajendra Acharya

Figure 1 for Active spline model: A shape based model-interactive segmentation

Figure 2 for Active spline model: A shape based model-interactive segmentation

Figure 3 for Active spline model: A shape based model-interactive segmentation

Figure 4 for Active spline model: A shape based model-interactive segmentation

Abstract:Rarely in literature a method of segmentation cares for the edit after the algorithm delivers. They provide no solution when segmentation goes wrong. We propose to formulate point distribution model in terms of centripetal-parameterized Catmull-Rom spline. Such fusion brings interactivity to model-based segmentation, so that edit is better handled. When the delivered segment is unsatisfactory, user simply shifts points to vary the curve. We ran the method on three disparate imaging modalities and achieved an average overlap of 0.879 for automated lung segmentation on chest radiographs. The edit afterward improved the average overlap to 0.945, with a minimum of 0.925. The source code and the demo video are available at http://wp.me/p3vCKy-2S

* submitted to Computers in biology and Medicine, second revision

Via

Access Paper or Ask Questions