The low-level spatial detail information and high-level semantic abstract information are both essential to the semantic segmentation task. The features extracted by the deep network can obtain rich semantic information, while a lot of spatial information is lost. However, how to recover spatial detail information effectively and fuse it with high-level semantics has not been well addressed so far. In this paper, we propose a new architecture based on Bilateral Segmentation Network (BiseNet) called Multi-scale Covariance Feature Fusion Network (MCFNet). Specifically, this network introduces a new feature refinement module and a new feature fusion module. Furthermore, a gating unit named L-Gate is proposed to filter out invalid information and fuse multi-scale features. We evaluate our proposed model on Cityscapes, CamVid datasets and compare it with the state-of-the-art methods. Extensive experiments show that our method achieves competitive success. On Cityscapes, we achieve 75.5% mIOU with a speed of 151.3 FPS.