The segmentation of medical images is important for the improvement and creation of healthcare systems, particularly for early disease detection and treatment planning. In recent years, the use of convolutional neural networks (CNNs) and other state-of-the-art methods has greatly advanced medical image segmentation. However, CNNs have been found to struggle with learning long-range dependencies and capturing global context due to the limitations of convolution operations. In this paper, we explore the use of transformers and CNNs for medical image segmentation and propose a hybrid architecture that combines the ability of transformers to capture global dependencies with the ability of CNNs to capture low-level spatial details. We compare various architectures and configurations and conduct multiple experiments to evaluate their effectiveness.