There has been a surge of recent interest in Machine Learning (ML), particularly Deep Neural Network (DNN)-based models, to decode muscle activities from surface Electromyography (sEMG) signals for myoelectric control of neurorobotic systems. DNN-based models, however, require large training sets and, typically, have high structural complexity, i.e., they depend on a large number of trainable parameters. To address these issues, we developed a framework based on the Transformer architecture for processing sEMG signals. We propose a novel Vision Transformer (ViT)-based neural network architecture (referred to as the TEMGNet) to classify and recognize upperlimb hand gestures from sEMG to be used for myocontrol of prostheses. The proposed TEMGNet architecture is trained with a small dataset without the need for pre-training or fine-tuning. To evaluate the efficacy, following the-recent literature, the second subset (exercise B) of the NinaPro DB2 dataset was utilized, where the proposed TEMGNet framework achieved a recognition accuracy of 82.93% and 82.05% for window sizes of 300ms and 200ms, respectively, outperforming its state-of-the-art counterparts. Moreover, the proposed TEMGNet framework is superior in terms of structural capacity while having seven times fewer trainable parameters. These characteristics and the high performance make DNN-based models promising approaches for myoelectric control of neurorobots.