Medical image segmentation plays a vital role in clinic disease diagnosis and medical image analysis. However, labeling medical images for segmentation task is tough due to the indispensable domain expertise of radiologists. Furthermore, considering the privacy and sensitivity of medical images, it is impractical to build a centralized segmentation dataset from different medical institutions. Federated learning aims to train a shared model of isolated clients without local data exchange which aligns well with the scarcity and privacy characteristics of medical data. To solve the problem of labeling hard, many advanced semi-supervised methods have been proposed in a centralized data setting. As for federated learning, how to conduct semi-supervised learning under this distributed scenario is worth investigating. In this work, we propose a novel federated semi-supervised learning framework for medical image segmentation. The intra-client and inter-client consistency learning are introduced to smooth predictions at the data level and avoid confirmation bias of local models. They are achieved with the assistance of a Variational Autoencoder (VAE) trained collaboratively by clients. The added VAE model plays three roles: 1) extracting latent low-dimensional features of all labeled and unlabeled data; 2) performing a novel type of data augmentation in calculating intra-client consistency loss; 3) utilizing the generative ability of itself to conduct inter-client consistency distillation. The proposed framework is compared with other federated semi-supervised or self-supervised learning methods. The experimental results illustrate that our method outperforms the state-of-the-art method while avoiding a lot of computation and communication overhead.