Convolutional neural networks (CNNs) have recently achieved remarkable success in automatically identifying organs or lesions on 3D medical images. Meanwhile, vision transformer networks have exhibited exceptional performance in 2D image classification tasks. Compared with CNNs, transformer networks have an obvious advantage of extracting long-range features due to their self-attention algorithm. Therefore, in this paper we present a CNN-Transformer combined model called BiTr-Unet for brain tumor segmentation on multi-modal MRI scans. The proposed BiTr-Unet achieves good performance on the BraTS 2021 validation dataset with mean Dice score 0.9076, 0.8392 and 0.8231, and mean Hausdorff distance 4.5322, 13.4592 and 14.9963 for the whole tumor, tumor core, and enhancing tumor, respectively.