Federated learning promises to use the computational power of edge devices while maintaining user data privacy. Current frameworks, however, typically make the unrealistic assumption that the data stored on user devices come with ground truth labels, while the server has no data. In this work, we consider the more realistic scenario where the users have only unlabeled data and the server has a limited amount of labeled data. In this semi-supervised federated learning (ssfl) setting, the data distribution can be non-iid, in the sense of different distributions of classes at different users. We define a metric, $R$, to measure this non-iidness in class distributions. In this setting, we provide a thorough study on different factors that can affect the final test accuracy, including algorithm design (such as training objective), the non-iidness $R$, the communication period $T$, the number of users $K$, the amount of labeled data in the server $N_s$, and the number of users $C_k\leq K$ that communicate with the server in each communication round. We evaluate our ssfl framework on Cifar-10, SVHN, and EMNIST. Overall, we find that a simple consistency loss-based method, along with group normalization, achieves better generalization performance, even compared to previous supervised federated learning settings. Furthermore, we propose a novel grouping-based model average method to improve convergence efficiency, and we show that this can boost performance by up to 10.79% on EMNIST, compared to the non-grouping based method.